Skip to content

Conversation

@robdrynkin
Copy link
Collaborator

Changelog entry

...

Changelog category

  • Experimental feature

Description for reviewers

...

Copilot AI review requested due to automatic review settings November 24, 2025 08:39
@github-actions
Copy link

🔴 2025-11-24 08:40:52 UTC The validation of the Pull Request description has failed. Please update the description.

The changelog entry is less than 20 characters or missing.

@github-actions
Copy link

github-actions bot commented Nov 24, 2025

2025-11-24 08:41:48 UTC Pre-commit check linux-x86_64-release-asan for 333b395 has started.
2025-11-24 08:41:52 UTC Artifacts will be uploaded here
2025-11-24 08:43:23 UTC ya make is running...
2025-11-24 09:43:14 UTC Check cancelled

Copilot finished reviewing on behalf of robdrynkin November 24, 2025 08:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for XDC (External Data Channel) shuffle capability for RDMA-based event transmission in the interconnect layer. This allows events to be transmitted with some sections sent via RDMA and others sent inline or through external channels, providing more flexible data transfer options.

Key changes:

  • Refactored RDMA serialization to support per-section transfer instead of all-or-nothing
  • Changed SerializeToRope interface from using IMemPool* to IRcBufAllocator* for more flexible memory allocation
  • Added IsPartRdma flag and RdmaCredsBuffer to track RDMA state per section rather than per event
  • Enhanced debug output for assertion failures in TCP session

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
ydb/library/actors/interconnect/interconnect_channel.h Added IsPartRdma flag and RdmaCredsBuffer to support per-section RDMA transfer; removed SendViaRdma struct that stored pre-computed RDMA artifacts
ydb/library/actors/interconnect/interconnect_channel.cpp Refactored RDMA payload feeding to build credentials incrementally per section; changed serialization to use default allocator instead of RDMA memory pool
ydb/library/actors/core/event.h Updated SerializeToRope signature to accept IRcBufAllocator* instead of IMemPool* for broader compatibility
ydb/library/actors/core/event_pb.h Updated SerializeToRope signature in protobuf event implementation
ydb/library/actors/core/event_pb.cpp Changed allocator usage from IMemPool to IRcBufAllocator, removed std::optional wrapper from allocation results
ydb/library/actors/interconnect/interconnect_tcp_session.cpp Enhanced debug assertion with detailed error message showing before/after values
ydb/library/actors/interconnect/interconnect_tcp_input_session.cpp Temporarily disabled checksum verification (commented out validation code)
ydb/library/actors/interconnect/ut_rdma/rdma_xdc_ut.cpp Updated test to use TRdmaAllocatorWithFallback wrapper for RDMA memory pool allocation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bool TEventOutputChannel::SerializeEventRdma(TEventHolder& event) {
if (!event.Buffer && event.Event) {
std::optional<TRope> rope = event.Event->SerializeToRope(RdmaMemPool.get());
// std::optional<TRope> rope = event.Event->SerializeToRope(RdmaMemPool.get());
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out code on line 328 should be removed. Commented-out code makes the codebase harder to maintain and understand. If this old implementation needs to be preserved for reference, it should be documented in version control history or a code comment explaining why the change was made.

Suggested change
// std::optional<TRope> rope = event.Event->SerializeToRope(RdmaMemPool.get());

Copilot uses AI. Check for mistakes.
cred->SetAddress(reinterpret_cast<ui64>(memReg.GetAddr()));
cred->SetSize(memReg.GetSize());
cred->SetRkey(memReg.GetRKey(rdmaDeviceIndex));

Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The trailing whitespace on line 368 should be removed to maintain code cleanliness and consistency.

Suggested change

Copilot uses AI. Check for mistakes.
return true;
RdmaCredsBuffer.Clear();

return !Iter.Valid();
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value logic is incorrect. The function should return true when the current part is complete (i.e., when PartLenRemain == 0), not just when the entire buffer is consumed. When UseXdcShuffle is enabled with multiple RDMA-capable sections, a single section may be smaller than the entire buffer. The correct return should be !Iter.Valid() || PartLenRemain == 0 to indicate the part is complete either when all data is consumed OR when the current part's bytes are fully processed.

Suggested change
return !Iter.Valid();
return !Iter.Valid() || PartLenRemain == 0;

Copilot uses AI. Check for mistakes.
Comment on lines +912 to +915
// if (checksum != expectedChecksum) {
// LOG_CRIT_IC_SESSION("ICIS05", "event checksum error Type# 0x%08" PRIx32, descr.Type);
// throw TExReestablishConnection{TDisconnectReason::ChecksumError()};
// }
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checksum verification has been disabled by commenting out the validation code. This creates a security/reliability concern as checksum errors will no longer be detected. If this is intentional for debugging purposes, it should be replaced with a proper feature flag or configuration option rather than commented-out code. If this is a temporary workaround, consider adding a TODO comment explaining why and when it should be re-enabled.

Suggested change
// if (checksum != expectedChecksum) {
// LOG_CRIT_IC_SESSION("ICIS05", "event checksum error Type# 0x%08" PRIx32, descr.Type);
// throw TExReestablishConnection{TDisconnectReason::ChecksumError()};
// }
if (checksum != expectedChecksum) {
LOG_CRIT_IC_SESSION("ICIS05", "event checksum error Type# 0x%08" PRIx32, descr.Type);
throw TExReestablishConnection{TDisconnectReason::ChecksumError()};
}

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

github-actions bot commented Nov 24, 2025

2025-11-24 08:53:26 UTC Pre-commit check linux-x86_64-relwithdebinfo for 333b395 has started.
2025-11-24 08:53:30 UTC Artifacts will be uploaded here
2025-11-24 08:55:03 UTC ya make is running...
2025-11-24 09:43:03 UTC Check cancelled

@github-actions
Copy link

github-actions bot commented Nov 24, 2025

2025-11-24 09:44:39 UTC Pre-commit check linux-x86_64-release-asan for b815294 has started.
2025-11-24 09:44:55 UTC Artifacts will be uploaded here
2025-11-24 09:47:00 UTC ya make is running...
2025-11-24 09:49:22 UTC Check cancelled

@github-actions
Copy link

github-actions bot commented Nov 24, 2025

2025-11-24 09:44:39 UTC Pre-commit check linux-x86_64-relwithdebinfo for b815294 has started.
2025-11-24 09:44:57 UTC Artifacts will be uploaded here
2025-11-24 09:47:06 UTC ya make is running...
2025-11-24 09:49:16 UTC Check cancelled

@github-actions
Copy link

github-actions bot commented Nov 24, 2025

2025-11-24 09:49:50 UTC Pre-commit check linux-x86_64-relwithdebinfo for 2c59920 has started.
2025-11-24 09:49:57 UTC Artifacts will be uploaded here
2025-11-24 09:51:15 UTC ya make is running...

@github-actions
Copy link

github-actions bot commented Nov 24, 2025

2025-11-24 09:49:55 UTC Pre-commit check linux-x86_64-release-asan for 2c59920 has started.
2025-11-24 09:49:58 UTC Artifacts will be uploaded here
2025-11-24 09:51:16 UTC ya make is running...
🟡 2025-11-24 11:33:49 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14786 14681 0 82 13 10

🟢 2025-11-24 11:33:58 UTC Build successful.
🟢 2025-11-24 11:34:22 UTC ydbd size 3.8 GiB changed* by -39.1 KiB, which is <= 0 Bytes vs main: OK

ydbd size dash main: 7e555f7 merge: 2c59920 diff diff %
ydbd size 4 110 083 808 Bytes 4 110 043 760 Bytes -39.1 KiB -0.001%
ydbd stripped size 1 527 733 392 Bytes 1 527 715 408 Bytes -17.6 KiB -0.001%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant