-
Notifications
You must be signed in to change notification settings - Fork 84
feat(kv-ir): Add StringBlob for compact storage of multiple strings in a contiguous byte buffer.
#1544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a new clp::ffi::StringBlob class for concatenated indexed strings, a unit test verifying incremental reads and EOF handling, and updates CMakeLists.txt to include the new test in unit test builds. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as UnitTest
participant Blob as clp::ffi::StringBlob
participant Reader as ReaderInterface
Note over Test,Blob: Test performs repeated reads and validations
Test->>Blob: construct()
Test->>Reader: prepare buffer (concatenated strings)
loop per expected string
Test->>Blob: read_from(Reader, length)
Blob->>Reader: request length
alt success
Reader-->>Blob: bytes
Blob-->>Test: success (no ErrorCode)
Test->>Blob: get_string(index)
Blob-->>Test: std::string_view
else EOF / read error
Reader-->>Blob: error (EndOfFile)
Blob-->>Test: returns ErrorCode
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (5)📓 Common learnings📚 Learning: 2024-11-19T17:30:04.970ZApplied to files:
📚 Learning: 2025-08-09T04:07:27.083ZApplied to files:
📚 Learning: 2024-10-08T15:52:50.753ZApplied to files:
📚 Learning: 2025-07-01T14:49:07.333ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
components/core/CMakeLists.txt(2 hunks)components/core/src/clp/ffi/StringBlob.hpp(1 hunks)components/core/src/clp/ffi/test/test_StringBlob.cpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/ffi/StringBlob.hppcomponents/core/src/clp/ffi/test/test_StringBlob.cpp
🧠 Learnings (8)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2024-10-07T20:10:08.254Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Applied to files:
components/core/CMakeLists.txt
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Applied to files:
components/core/CMakeLists.txtcomponents/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2025-07-01T14:49:07.333Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.
Applied to files:
components/core/CMakeLists.txt
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Applied to files:
components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Applied to files:
components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-11-29T22:50:17.206Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
Applied to files:
components/core/src/clp/ffi/test/test_StringBlob.cpp
🧬 Code graph analysis (1)
components/core/src/clp/ffi/test/test_StringBlob.cpp (1)
components/core/src/clp/ffi/StringBlob.hpp (1)
reader(44-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (6)
components/core/src/clp/ffi/StringBlob.hpp (3)
1-18: LGTM!The header structure, includes, and constructor are well organized and follow proper conventions.
21-21: LGTM!The implementation correctly calculates the number of strings based on the offset count, and is safe given the initialization of
m_offsetswith{0}.
44-58: Consider more natural error comparison order.The method logic is correct with proper error handling and rollback. However, the error comparison uses Yoda-style ordering which is less conventional in modern C++.
- if (auto const err{reader.try_read_exact_length(m_data.data() + start_offset, length)}; - ErrorCode::ErrorCode_Success != err) + if (auto const err{reader.try_read_exact_length(m_data.data() + start_offset, length)}; + err != ErrorCode::ErrorCode_Success)⛔ Skipped due to learnings
Learnt from: haiqi96 Repo: y-scope/clp PR: 523 File: components/core/src/clp/BufferedFileReader.cpp:96-106 Timestamp: 2024-10-24T14:45:26.265Z Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.Learnt from: haiqi96 Repo: y-scope/clp PR: 523 File: components/core/src/clp/BufferedFileReader.hpp:158-158 Timestamp: 2024-10-24T14:46:00.664Z Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.components/core/src/clp/ffi/test/test_StringBlob.cpp (1)
12-45: LGTM!The test case provides comprehensive coverage of
StringBlobfunctionality:
- Verifies string count tracking before and after each insertion
- Confirms successful reads return no error
- Validates correct string retrieval by index
- Tests end-of-file behavior
The NOLINTNEXTLINE suppressions are appropriate given the preceding
has_value()checks.components/core/CMakeLists.txt (2)
486-486: LGTM!The test file is correctly added to the unit test sources.
538-538: LGTM!The header file is correctly added to the unit test sources for proper IDE integration and dependency tracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
components/core/src/clp/ffi/StringBlob.hpp (1)
23-35: Past optimization suggestion still applies.The implementation is correct. A previous review suggested constructing the
string_viewdirectly from offset pointer for efficiency, which remains a valid optimization.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp/ffi/StringBlob.hpp(1 hunks)components/core/src/clp/ffi/test/test_StringBlob.cpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/ffi/test/test_StringBlob.cppcomponents/core/src/clp/ffi/StringBlob.hpp
🧠 Learnings (7)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Applied to files:
components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Applied to files:
components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-11-29T22:50:17.206Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
Applied to files:
components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp/ffi/StringBlob.hpp
📚 Learning: 2025-05-02T22:27:59.347Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Applied to files:
components/core/src/clp/ffi/StringBlob.hpp
📚 Learning: 2025-09-11T13:15:39.930Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/ColumnWriter.cpp:67-72
Timestamp: 2025-09-11T13:15:39.930Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, when using sizeof with types like variable_dictionary_id_t, always use the fully qualified namespace (clp::variable_dictionary_id_t) to match the variable declaration and maintain consistency with other column writers in the codebase.
Applied to files:
components/core/src/clp/ffi/StringBlob.hpp
🧬 Code graph analysis (1)
components/core/src/clp/ffi/test/test_StringBlob.cpp (1)
components/core/src/clp/ffi/StringBlob.hpp (1)
reader(44-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: package-image
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
🔇 Additional comments (8)
components/core/src/clp/ffi/test/test_StringBlob.cpp (4)
1-19: LGTM! Test setup is clean and comprehensive.The includes are appropriate, and the test data provides good coverage with varied string content.
21-39: LGTM! Test logic is thorough and correct.The test properly verifies:
- String counts before and after each read operation
- Successful reads return
std::nullopt- Retrieved strings match expected values
The NOLINTNEXTLINE suppressions are appropriate for the unavoidable optional accesses after
REQUIREassertions.
41-44: LGTM! EOF handling is properly tested.The test correctly verifies that reading beyond the available data returns
ErrorCode_EndOfFile.
32-32: Consider using explicit false comparison for consistency with coding guidelines.The coding guideline prefers
false == <expression>over negation. WhileREQUIRE_FALSEis idiomatic Catch2, you could useREQUIRE((false == result.has_value()))for strict guideline compliance.As per coding guidelines.
⛔ Skipped due to learnings
Learnt from: LinZhihao-723 Repo: y-scope/clp PR: 558 File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:228-236 Timestamp: 2024-10-14T03:43:40.364Z Learning: In `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, helper functions like `assert_kv_pair_log_event_creation_failure` return booleans to the caller, and the caller asserts using `REQUIRE` to provide clearer failure information.Learnt from: Bill-hbrhbr Repo: y-scope/clp PR: 0 File: :0-0 Timestamp: 2025-03-18T07:27:54.738Z Learning: Double parentheses in REQUIRE statements (e.g., `REQUIRE((expression))`) and in conditional expressions are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.Learnt from: AVMatthews Repo: y-scope/clp PR: 595 File: components/core/tests/test-clp_s-end_to_end.cpp:109-110 Timestamp: 2024-11-29T22:50:17.206Z Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.Learnt from: junhaoliao Repo: y-scope/clp PR: 988 File: components/log-viewer-webui/client/src/components/QueryBox/InputWithCaseSensitive/CaseSenstiveToggle.tsx:34-37 Timestamp: 2025-06-09T17:48:56.024Z Learning: In the y-scope/clp project, prefer `false == <expression>` rather than `!<expression>` for boolean expressions in TypeScript/JavaScript files, as specified in the coding guidelines.components/core/src/clp/ffi/StringBlob.hpp (4)
1-12: LGTM! Header structure is well-organized.The header guard naming is correct, and includes are complete and necessary.
13-21: LGTM! Class structure and basic accessor are correct.The
get_num_strings()implementation correctly calculates the string count based on the offset vector (initialized with a single 0 element).
44-58: LGTM! Error handling and state management are robust.The implementation correctly:
- Resizes the buffer to accommodate the new string
- Reverts the buffer size if the read fails (line 53)
- Appends the new offset only on success
- Returns appropriate error codes
60-66: LGTM! Private members and namespace closing are correct.The member variables are appropriately typed, and
m_offsetsis correctly initialized with{0}to provide the starting offset. The namespace closing comment accurately reflects the opened namespace.
gibber9809
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty minor comment, but otherwise looks good.
Co-authored-by: Devin Gibson <[email protected]>
gibber9809
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. PR title is fine as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/src/clp/ffi/StringBlob.hpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/ffi/StringBlob.hpp
🧠 Learnings (4)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp/ffi/StringBlob.hpp
📚 Learning: 2025-05-02T22:27:59.347Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Applied to files:
components/core/src/clp/ffi/StringBlob.hpp
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
Applied to files:
components/core/src/clp/ffi/StringBlob.hpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
🔇 Additional comments (1)
components/core/src/clp/ffi/StringBlob.hpp (1)
60-66: Smart initialization of m_offsets ensures correct initial state.Initializing
m_offsetswith{0}is critical for maintaining the class invariant. It ensures that a default-constructedStringBlobcorrectly reports zero strings and that all indexing logic remains valid from the start.
| #ifndef CLP_FFI_STRINGBLOB_HPP | ||
| #define CLP_FFI_STRINGBLOB_HPP | ||
|
|
||
| #include <cstddef> | ||
| #include <optional> | ||
| #include <string> | ||
| #include <string_view> | ||
| #include <vector> | ||
|
|
||
| #include "../ErrorCode.hpp" | ||
| #include "../ReaderInterface.hpp" | ||
|
|
||
| namespace clp::ffi { | ||
| // Stores a list of strings as an indexable blob. | ||
| class StringBlob { | ||
| public: | ||
| // Constructors | ||
| StringBlob() = default; | ||
|
|
||
| // Methods | ||
| [[nodiscard]] auto get_num_strings() const -> size_t { return m_offsets.size() - 1; } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Well-structured header with safe initialization.
The header guards, includes, and class structure are clean and appropriate. The initialization of m_offsets{0} on line 62 ensures that get_num_strings() safely returns 0 for an empty blob without underflow risk.
Consider adding noexcept to get_num_strings() since it cannot throw:
- [[nodiscard]] auto get_num_strings() const -> size_t { return m_offsets.size() - 1; }
+ [[nodiscard]] auto get_num_strings() const noexcept -> size_t { return m_offsets.size() - 1; }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #ifndef CLP_FFI_STRINGBLOB_HPP | |
| #define CLP_FFI_STRINGBLOB_HPP | |
| #include <cstddef> | |
| #include <optional> | |
| #include <string> | |
| #include <string_view> | |
| #include <vector> | |
| #include "../ErrorCode.hpp" | |
| #include "../ReaderInterface.hpp" | |
| namespace clp::ffi { | |
| // Stores a list of strings as an indexable blob. | |
| class StringBlob { | |
| public: | |
| // Constructors | |
| StringBlob() = default; | |
| // Methods | |
| [[nodiscard]] auto get_num_strings() const -> size_t { return m_offsets.size() - 1; } | |
| #ifndef CLP_FFI_STRINGBLOB_HPP | |
| #define CLP_FFI_STRINGBLOB_HPP | |
| #include <cstddef> | |
| #include <optional> | |
| #include <string> | |
| #include <string_view> | |
| #include <vector> | |
| #include "../ErrorCode.hpp" | |
| #include "../ReaderInterface.hpp" | |
| namespace clp::ffi { | |
| // Stores a list of strings as an indexable blob. | |
| class StringBlob { | |
| public: | |
| // Constructors | |
| StringBlob() = default; | |
| // Methods | |
| [[nodiscard]] auto get_num_strings() const noexcept -> size_t { return m_offsets.size() - 1; } |
🤖 Prompt for AI Agents
In components/core/src/clp/ffi/StringBlob.hpp around lines 1 to 22, the getter
get_num_strings() is non-throwing and should be marked noexcept; update its
declaration/definition to add noexcept (e.g., auto get_num_strings() const
noexcept -> size_t) so the compiler and callers know it won’t throw and to
enable better optimization and exception-safety guarantees.
| /** | ||
| * @param index | ||
| * @return A view of the string at the given `index` in the blob. | ||
| * @return std::nullopt if `index` is out of bounds. | ||
| */ | ||
| [[nodiscard]] auto get_string(size_t index) const -> std::optional<std::string_view> { | ||
| if (index >= get_num_strings()) { | ||
| return std::nullopt; | ||
| } | ||
| size_t const start_offset{m_offsets[index]}; | ||
| size_t const end_offset{m_offsets[index + 1]}; | ||
| return std::string_view{m_data}.substr(start_offset, end_offset - start_offset); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Bounds checking is correct; consider noexcept.
The bounds validation correctly ensures safe access to m_offsets[index + 1]. The method could be marked noexcept since it returns std::nullopt on invalid input rather than throwing.
- [[nodiscard]] auto get_string(size_t index) const -> std::optional<std::string_view> {
+ [[nodiscard]] auto get_string(size_t index) const noexcept -> std::optional<std::string_view> {🤖 Prompt for AI Agents
components/core/src/clp/ffi/StringBlob.hpp around lines 23-35: the method
get_string performs bounds checks and never throws; mark it noexcept to document
and enable optimizations — update the function signature to append noexcept
(both declaration and definition if separated) while leaving behavior unchanged
(return std::nullopt on out-of-bounds and return the std::string_view
otherwise).
| /** | ||
| * Reads a string of the given `length` from the `reader` and appends it to the blob. | ||
| * @param reader | ||
| * @param length The exact length of the string to read. | ||
| * @return std::nullopt on success. | ||
| * @return Forwards `ReaderInterface::try_read_exact_length`'s error code on failure. | ||
| */ | ||
| [[nodiscard]] auto read_from(ReaderInterface& reader, size_t length) | ||
| -> std::optional<ErrorCode> { | ||
| auto const start_offset{m_data.size()}; | ||
| auto const end_offset{start_offset + length}; | ||
| m_data.resize(static_cast<std::string::size_type>(end_offset)); | ||
| // NOLINTNEXTLINE(cppcoreguidelines-pro-bounds-pointer-arithmetic) | ||
| if (auto const err{reader.try_read_exact_length(m_data.data() + start_offset, length)}; | ||
| ErrorCode::ErrorCode_Success != err) | ||
| { | ||
| m_data.resize(start_offset); | ||
| return err; | ||
| } | ||
| m_offsets.emplace_back(end_offset); | ||
| return std::nullopt; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Excellent error handling with proper rollback; minor cast redundancy.
The read_from implementation correctly handles errors by rolling back m_data to its previous size before returning the error code. The logic properly maintains the invariant that m_offsets.size() - 1 equals the number of successfully stored strings.
Line 48 contains a redundant cast since end_offset is already size_t and std::string::size_type is typically size_t:
- m_data.resize(static_cast<std::string::size_type>(end_offset));
+ m_data.resize(end_offset);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /** | |
| * Reads a string of the given `length` from the `reader` and appends it to the blob. | |
| * @param reader | |
| * @param length The exact length of the string to read. | |
| * @return std::nullopt on success. | |
| * @return Forwards `ReaderInterface::try_read_exact_length`'s error code on failure. | |
| */ | |
| [[nodiscard]] auto read_from(ReaderInterface& reader, size_t length) | |
| -> std::optional<ErrorCode> { | |
| auto const start_offset{m_data.size()}; | |
| auto const end_offset{start_offset + length}; | |
| m_data.resize(static_cast<std::string::size_type>(end_offset)); | |
| // NOLINTNEXTLINE(cppcoreguidelines-pro-bounds-pointer-arithmetic) | |
| if (auto const err{reader.try_read_exact_length(m_data.data() + start_offset, length)}; | |
| ErrorCode::ErrorCode_Success != err) | |
| { | |
| m_data.resize(start_offset); | |
| return err; | |
| } | |
| m_offsets.emplace_back(end_offset); | |
| return std::nullopt; | |
| } | |
| /** | |
| * Reads a string of the given `length` from the `reader` and appends it to the blob. | |
| * @param reader | |
| * @param length The exact length of the string to read. | |
| * @return std::nullopt on success. | |
| * @return Forwards `ReaderInterface::try_read_exact_length`'s error code on failure. | |
| */ | |
| [[nodiscard]] auto read_from(ReaderInterface& reader, size_t length) | |
| -> std::optional<ErrorCode> { | |
| auto const start_offset{m_data.size()}; | |
| auto const end_offset{start_offset + length}; | |
| m_data.resize(end_offset); | |
| // NOLINTNEXTLINE(cppcoreguidelines-pro-bounds-pointer-arithmetic) | |
| if (auto const err{reader.try_read_exact_length(m_data.data() + start_offset, length)}; | |
| ErrorCode::ErrorCode_Success != err) | |
| { | |
| m_data.resize(start_offset); | |
| return err; | |
| } | |
| m_offsets.emplace_back(end_offset); | |
| return std::nullopt; | |
| } |
🤖 Prompt for AI Agents
In components/core/src/clp/ffi/StringBlob.hpp around lines 37 to 58, the call to
m_data.resize uses a redundant static_cast to std::string::size_type for
end_offset (which is already size_t); remove the unnecessary cast and call
m_data.resize(end_offset) directly to simplify the code and avoid redundant
conversion.
Description
This PR is contributing to #1541.
As the title suggests, this PR introduces
StringBlobas an efficient storage format, replacingstd::vector<std::string>.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Tests
Chores