Skip to content

Conversation

@LinZhihao-723
Copy link
Member

@LinZhihao-723 LinZhihao-723 commented Nov 3, 2025

Description

This PR is contributing to #1541.

As the title suggests, this PR introduces StringBlob as an efficient storage format, replacing std::vector<std::string>.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Ensure all workflows passed.
  • Add basic unit tests to cover the behavior.

Summary by CodeRabbit

  • Tests

    • Added unit tests validating incremental buffered string reads, indexed string retrieval after each append, and proper end-of-data error handling.
  • Chores

    • Introduced a compact string-buffer component that stores multiple strings with offset tracking for indexed access and robust read semantics.

@LinZhihao-723 LinZhihao-723 requested a review from a team as a code owner November 3, 2025 23:28
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 3, 2025

Walkthrough

Adds a new clp::ffi::StringBlob class for concatenated indexed strings, a unit test verifying incremental reads and EOF handling, and updates CMakeLists.txt to include the new test in unit test builds.

Changes

Cohort / File(s) Summary
Build configuration
components/core/CMakeLists.txt
Adds test_StringBlob.cpp to unit test source lists so the new test is compiled.
StringBlob header
components/core/src/clp/ffi/StringBlob.hpp
New header introducing clp::ffi::StringBlob with m_data and m_offsets, default ctor, get_num_strings(), get_string(size_t) returning std::optional<std::string_view>, and read_from(ReaderInterface&, size_t) returning std::optional<ErrorCode>; appends data and rolls back on read error.
Unit tests
components/core/src/clp/ffi/test/test_StringBlob.cpp
New unit test using a BufferReader to feed concatenated strings, asserting get_num_strings(), get_string(index), successful reads, and EOF/error on final read.

Sequence Diagram(s)

sequenceDiagram
    participant Test as UnitTest
    participant Blob as clp::ffi::StringBlob
    participant Reader as ReaderInterface

    Note over Test,Blob: Test performs repeated reads and validations
    Test->>Blob: construct()
    Test->>Reader: prepare buffer (concatenated strings)
    loop per expected string
        Test->>Blob: read_from(Reader, length)
        Blob->>Reader: request length
        alt success
            Reader-->>Blob: bytes
            Blob-->>Test: success (no ErrorCode)
            Test->>Blob: get_string(index)
            Blob-->>Test: std::string_view
        else EOF / read error
            Reader-->>Blob: error (EndOfFile)
            Blob-->>Test: returns ErrorCode
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Areas needing attention:
    • components/core/src/clp/ffi/StringBlob.hpp — verify offset arithmetic, bounds checking, and rollback behavior on read failure.
    • components/core/src/clp/ffi/test/test_StringBlob.cpp — confirm the test correctly simulates ReaderInterface and covers edge cases (EOF).
    • components/core/CMakeLists.txt — ensure the test file was added to the correct targets.

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically describes the main change: introducing StringBlob for compact storage of multiple strings in a contiguous byte buffer, which aligns with the actual changes across CMakeLists.txt, the new StringBlob header, and test file.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbec677 and 3132f0f.

📒 Files selected for processing (1)
  • components/core/CMakeLists.txt (2 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: 2025-07-01T14:49:07.333Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Applied to files:

  • components/core/CMakeLists.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build (macos-15)
🔇 Additional comments (2)
components/core/CMakeLists.txt (2)

539-539: Verify necessity of including header file in source file list.

Adding src/clp/ffi/StringBlob.hpp to SOURCE_FILES_unitTest is unconventional; header files are typically not included in CMake source file lists for compilation. While some .hpp files appear elsewhere in this list (e.g., line 538), other similar API headers like encoding_methods.hpp (line 485) are not included.

Please verify whether this file needs to be in the source list or if it's already accessible through existing include directories (via target_include_directories). If unnecessary, consider removing it.


487-487: Reconsider placement of test file to align with existing grouping pattern.

The test file src/clp/ffi/test/test_StringBlob.cpp should be grouped with other ffi test files, currently clustered at lines 512–514 (ir_stream/search/test/ files). Placing it between encoding_methods.inc and ir_stream/byteswap.hpp breaks the logical organization and reduces maintainability.

Consider moving the test file to be adjacent to other ffi test sources for consistency.

⛔ Skipped due to learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a283379 and b41bca7.

📒 Files selected for processing (3)
  • components/core/CMakeLists.txt (2 hunks)
  • components/core/src/clp/ffi/StringBlob.hpp (1 hunks)
  • components/core/src/clp/ffi/test/test_StringBlob.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ffi/StringBlob.hpp
  • components/core/src/clp/ffi/test/test_StringBlob.cpp
🧠 Learnings (8)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2024-10-07T20:10:08.254Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • components/core/CMakeLists.txt
  • components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2025-07-01T14:49:07.333Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-11-29T22:50:17.206Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.

Applied to files:

  • components/core/src/clp/ffi/test/test_StringBlob.cpp
🧬 Code graph analysis (1)
components/core/src/clp/ffi/test/test_StringBlob.cpp (1)
components/core/src/clp/ffi/StringBlob.hpp (1)
  • reader (44-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (6)
components/core/src/clp/ffi/StringBlob.hpp (3)

1-18: LGTM!

The header structure, includes, and constructor are well organized and follow proper conventions.


21-21: LGTM!

The implementation correctly calculates the number of strings based on the offset count, and is safe given the initialization of m_offsets with {0}.


44-58: Consider more natural error comparison order.

The method logic is correct with proper error handling and rollback. However, the error comparison uses Yoda-style ordering which is less conventional in modern C++.

-        if (auto const err{reader.try_read_exact_length(m_data.data() + start_offset, length)};
-            ErrorCode::ErrorCode_Success != err)
+        if (auto const err{reader.try_read_exact_length(m_data.data() + start_offset, length)};
+            err != ErrorCode::ErrorCode_Success)
⛔ Skipped due to learnings
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
components/core/src/clp/ffi/test/test_StringBlob.cpp (1)

12-45: LGTM!

The test case provides comprehensive coverage of StringBlob functionality:

  • Verifies string count tracking before and after each insertion
  • Confirms successful reads return no error
  • Validates correct string retrieval by index
  • Tests end-of-file behavior

The NOLINTNEXTLINE suppressions are appropriate given the preceding has_value() checks.

components/core/CMakeLists.txt (2)

486-486: LGTM!

The test file is correctly added to the unit test sources.


538-538: LGTM!

The header file is correctly added to the unit test sources for proper IDE integration and dependency tracking.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
components/core/src/clp/ffi/StringBlob.hpp (1)

23-35: Past optimization suggestion still applies.

The implementation is correct. A previous review suggested constructing the string_view directly from offset pointer for efficiency, which remains a valid optimization.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b41bca7 and cee9ad7.

📒 Files selected for processing (2)
  • components/core/src/clp/ffi/StringBlob.hpp (1 hunks)
  • components/core/src/clp/ffi/test/test_StringBlob.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ffi/test/test_StringBlob.cpp
  • components/core/src/clp/ffi/StringBlob.hpp
🧠 Learnings (7)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-11-29T22:50:17.206Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.

Applied to files:

  • components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/src/clp/ffi/StringBlob.hpp
📚 Learning: 2025-05-02T22:27:59.347Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.

Applied to files:

  • components/core/src/clp/ffi/StringBlob.hpp
📚 Learning: 2025-09-11T13:15:39.930Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/ColumnWriter.cpp:67-72
Timestamp: 2025-09-11T13:15:39.930Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, when using sizeof with types like variable_dictionary_id_t, always use the fully qualified namespace (clp::variable_dictionary_id_t) to match the variable declaration and maintain consistency with other column writers in the codebase.

Applied to files:

  • components/core/src/clp/ffi/StringBlob.hpp
🧬 Code graph analysis (1)
components/core/src/clp/ffi/test/test_StringBlob.cpp (1)
components/core/src/clp/ffi/StringBlob.hpp (1)
  • reader (44-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
🔇 Additional comments (8)
components/core/src/clp/ffi/test/test_StringBlob.cpp (4)

1-19: LGTM! Test setup is clean and comprehensive.

The includes are appropriate, and the test data provides good coverage with varied string content.


21-39: LGTM! Test logic is thorough and correct.

The test properly verifies:

  • String counts before and after each read operation
  • Successful reads return std::nullopt
  • Retrieved strings match expected values

The NOLINTNEXTLINE suppressions are appropriate for the unavoidable optional accesses after REQUIRE assertions.


41-44: LGTM! EOF handling is properly tested.

The test correctly verifies that reading beyond the available data returns ErrorCode_EndOfFile.


32-32: Consider using explicit false comparison for consistency with coding guidelines.

The coding guideline prefers false == <expression> over negation. While REQUIRE_FALSE is idiomatic Catch2, you could use REQUIRE((false == result.has_value())) for strict guideline compliance.

As per coding guidelines.

⛔ Skipped due to learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:228-236
Timestamp: 2024-10-14T03:43:40.364Z
Learning: In `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, helper functions like `assert_kv_pair_log_event_creation_failure` return booleans to the caller, and the caller asserts using `REQUIRE` to provide clearer failure information.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in REQUIRE statements (e.g., `REQUIRE((expression))`) and in conditional expressions are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 988
File: components/log-viewer-webui/client/src/components/QueryBox/InputWithCaseSensitive/CaseSenstiveToggle.tsx:34-37
Timestamp: 2025-06-09T17:48:56.024Z
Learning: In the y-scope/clp project, prefer `false == <expression>` rather than `!<expression>` for boolean expressions in TypeScript/JavaScript files, as specified in the coding guidelines.
components/core/src/clp/ffi/StringBlob.hpp (4)

1-12: LGTM! Header structure is well-organized.

The header guard naming is correct, and includes are complete and necessary.


13-21: LGTM! Class structure and basic accessor are correct.

The get_num_strings() implementation correctly calculates the string count based on the offset vector (initialized with a single 0 element).


44-58: LGTM! Error handling and state management are robust.

The implementation correctly:

  • Resizes the buffer to accommodate the new string
  • Reverts the buffer size if the read fails (line 53)
  • Appends the new offset only on success
  • Returns appropriate error codes

60-66: LGTM! Private members and namespace closing are correct.

The member variables are appropriately typed, and m_offsets is correctly initialized with {0} to provide the starting offset. The namespace closing comment accurately reflects the opened namespace.

Copy link
Contributor

@gibber9809 gibber9809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty minor comment, but otherwise looks good.

Copy link
Contributor

@gibber9809 gibber9809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. PR title is fine as well.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cee9ad7 and fbec677.

📒 Files selected for processing (1)
  • components/core/src/clp/ffi/StringBlob.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ffi/StringBlob.hpp
🧠 Learnings (4)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/src/clp/ffi/StringBlob.hpp
📚 Learning: 2025-05-02T22:27:59.347Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.

Applied to files:

  • components/core/src/clp/ffi/StringBlob.hpp
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.

Applied to files:

  • components/core/src/clp/ffi/StringBlob.hpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
🔇 Additional comments (1)
components/core/src/clp/ffi/StringBlob.hpp (1)

60-66: Smart initialization of m_offsets ensures correct initial state.

Initializing m_offsets with {0} is critical for maintaining the class invariant. It ensures that a default-constructed StringBlob correctly reports zero strings and that all indexing logic remains valid from the start.

Comment on lines +1 to +22
#ifndef CLP_FFI_STRINGBLOB_HPP
#define CLP_FFI_STRINGBLOB_HPP

#include <cstddef>
#include <optional>
#include <string>
#include <string_view>
#include <vector>

#include "../ErrorCode.hpp"
#include "../ReaderInterface.hpp"

namespace clp::ffi {
// Stores a list of strings as an indexable blob.
class StringBlob {
public:
// Constructors
StringBlob() = default;

// Methods
[[nodiscard]] auto get_num_strings() const -> size_t { return m_offsets.size() - 1; }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Well-structured header with safe initialization.

The header guards, includes, and class structure are clean and appropriate. The initialization of m_offsets{0} on line 62 ensures that get_num_strings() safely returns 0 for an empty blob without underflow risk.

Consider adding noexcept to get_num_strings() since it cannot throw:

-    [[nodiscard]] auto get_num_strings() const -> size_t { return m_offsets.size() - 1; }
+    [[nodiscard]] auto get_num_strings() const noexcept -> size_t { return m_offsets.size() - 1; }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#ifndef CLP_FFI_STRINGBLOB_HPP
#define CLP_FFI_STRINGBLOB_HPP
#include <cstddef>
#include <optional>
#include <string>
#include <string_view>
#include <vector>
#include "../ErrorCode.hpp"
#include "../ReaderInterface.hpp"
namespace clp::ffi {
// Stores a list of strings as an indexable blob.
class StringBlob {
public:
// Constructors
StringBlob() = default;
// Methods
[[nodiscard]] auto get_num_strings() const -> size_t { return m_offsets.size() - 1; }
#ifndef CLP_FFI_STRINGBLOB_HPP
#define CLP_FFI_STRINGBLOB_HPP
#include <cstddef>
#include <optional>
#include <string>
#include <string_view>
#include <vector>
#include "../ErrorCode.hpp"
#include "../ReaderInterface.hpp"
namespace clp::ffi {
// Stores a list of strings as an indexable blob.
class StringBlob {
public:
// Constructors
StringBlob() = default;
// Methods
[[nodiscard]] auto get_num_strings() const noexcept -> size_t { return m_offsets.size() - 1; }
🤖 Prompt for AI Agents
In components/core/src/clp/ffi/StringBlob.hpp around lines 1 to 22, the getter
get_num_strings() is non-throwing and should be marked noexcept; update its
declaration/definition to add noexcept (e.g., auto get_num_strings() const
noexcept -> size_t) so the compiler and callers know it won’t throw and to
enable better optimization and exception-safety guarantees.

Comment on lines +23 to +35
/**
* @param index
* @return A view of the string at the given `index` in the blob.
* @return std::nullopt if `index` is out of bounds.
*/
[[nodiscard]] auto get_string(size_t index) const -> std::optional<std::string_view> {
if (index >= get_num_strings()) {
return std::nullopt;
}
size_t const start_offset{m_offsets[index]};
size_t const end_offset{m_offsets[index + 1]};
return std::string_view{m_data}.substr(start_offset, end_offset - start_offset);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Bounds checking is correct; consider noexcept.

The bounds validation correctly ensures safe access to m_offsets[index + 1]. The method could be marked noexcept since it returns std::nullopt on invalid input rather than throwing.

-    [[nodiscard]] auto get_string(size_t index) const -> std::optional<std::string_view> {
+    [[nodiscard]] auto get_string(size_t index) const noexcept -> std::optional<std::string_view> {
🤖 Prompt for AI Agents
components/core/src/clp/ffi/StringBlob.hpp around lines 23-35: the method
get_string performs bounds checks and never throws; mark it noexcept to document
and enable optimizations — update the function signature to append noexcept
(both declaration and definition if separated) while leaving behavior unchanged
(return std::nullopt on out-of-bounds and return the std::string_view
otherwise).

Comment on lines +37 to +58
/**
* Reads a string of the given `length` from the `reader` and appends it to the blob.
* @param reader
* @param length The exact length of the string to read.
* @return std::nullopt on success.
* @return Forwards `ReaderInterface::try_read_exact_length`'s error code on failure.
*/
[[nodiscard]] auto read_from(ReaderInterface& reader, size_t length)
-> std::optional<ErrorCode> {
auto const start_offset{m_data.size()};
auto const end_offset{start_offset + length};
m_data.resize(static_cast<std::string::size_type>(end_offset));
// NOLINTNEXTLINE(cppcoreguidelines-pro-bounds-pointer-arithmetic)
if (auto const err{reader.try_read_exact_length(m_data.data() + start_offset, length)};
ErrorCode::ErrorCode_Success != err)
{
m_data.resize(start_offset);
return err;
}
m_offsets.emplace_back(end_offset);
return std::nullopt;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Excellent error handling with proper rollback; minor cast redundancy.

The read_from implementation correctly handles errors by rolling back m_data to its previous size before returning the error code. The logic properly maintains the invariant that m_offsets.size() - 1 equals the number of successfully stored strings.

Line 48 contains a redundant cast since end_offset is already size_t and std::string::size_type is typically size_t:

-        m_data.resize(static_cast<std::string::size_type>(end_offset));
+        m_data.resize(end_offset);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* Reads a string of the given `length` from the `reader` and appends it to the blob.
* @param reader
* @param length The exact length of the string to read.
* @return std::nullopt on success.
* @return Forwards `ReaderInterface::try_read_exact_length`'s error code on failure.
*/
[[nodiscard]] auto read_from(ReaderInterface& reader, size_t length)
-> std::optional<ErrorCode> {
auto const start_offset{m_data.size()};
auto const end_offset{start_offset + length};
m_data.resize(static_cast<std::string::size_type>(end_offset));
// NOLINTNEXTLINE(cppcoreguidelines-pro-bounds-pointer-arithmetic)
if (auto const err{reader.try_read_exact_length(m_data.data() + start_offset, length)};
ErrorCode::ErrorCode_Success != err)
{
m_data.resize(start_offset);
return err;
}
m_offsets.emplace_back(end_offset);
return std::nullopt;
}
/**
* Reads a string of the given `length` from the `reader` and appends it to the blob.
* @param reader
* @param length The exact length of the string to read.
* @return std::nullopt on success.
* @return Forwards `ReaderInterface::try_read_exact_length`'s error code on failure.
*/
[[nodiscard]] auto read_from(ReaderInterface& reader, size_t length)
-> std::optional<ErrorCode> {
auto const start_offset{m_data.size()};
auto const end_offset{start_offset + length};
m_data.resize(end_offset);
// NOLINTNEXTLINE(cppcoreguidelines-pro-bounds-pointer-arithmetic)
if (auto const err{reader.try_read_exact_length(m_data.data() + start_offset, length)};
ErrorCode::ErrorCode_Success != err)
{
m_data.resize(start_offset);
return err;
}
m_offsets.emplace_back(end_offset);
return std::nullopt;
}
🤖 Prompt for AI Agents
In components/core/src/clp/ffi/StringBlob.hpp around lines 37 to 58, the call to
m_data.resize uses a redundant static_cast to std::string::size_type for
end_offset (which is already size_t); remove the unnecessary cast and call
m_data.resize(end_offset) directly to simplify the code and avoid redundant
conversion.

@LinZhihao-723 LinZhihao-723 merged commit f798f1f into y-scope:main Nov 4, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants