Skip to content

Numerous IBM test failures using OFI MTL #8905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hppritcha opened this issue Apr 30, 2021 · 3 comments
Closed

Numerous IBM test failures using OFI MTL #8905

hppritcha opened this issue Apr 30, 2021 · 3 comments

Comments

@hppritcha
Copy link
Member

Looks like PR #8536 broke the OFI MTL, at least in a non-cuda environment. Numerous IBM tests failed after this PR was merged into master:

FAIL: allgather_gap_inter
FAIL: allreduce_nocommute_gap_inter
FAIL: reduce_scatter_nocommute_gap_inter
FAIL: reduce_scatter_block_nocommute_gap_inter
FAIL: iallgather_gap_inter
FAIL: iallgatherv_gap_inter
FAIL: iallreduce_nocommute_gap_inter
FAIL: igather_gap_inter
FAIL: igatherv_gap_inter
FAIL: ireduce_nocommute_gap_inter
FAIL: ireduce_scatter_nocommute_gap_inter
FAIL: ireduce_scatter_block_nocommute_gap_inter
FAIL: iscatter_gap_inter
FAIL: iscatterv_gap_inter

XFAIL: 0

FAIL: 14

FAIL: bcast_struct
FAIL: int_overflow
FAIL: op
FAIL: scatter_gap
FAIL: iallgather_gap
FAIL: iallreduce_nocommute_gap
FAIL: iallreduce_nocommute_gap_in_place
FAIL: ibcast_struct
FAIL: iexscan_nocommute_gap
FAIL: iexscan_nocommute_gap_in_place
FAIL: igather_gap
FAIL: ireduce_nocommute_gap
FAIL: ireduce_nocommute_gap_in_place
FAIL: ireduce_scatter_block_nocommute_gap
FAIL: ireduce_scatter_block_nocommute_gap_in_place
FAIL: ireduce_scatter_nocommute_gap
FAIL: ireduce_scatter_nocommute_gap_in_place
FAIL: iscan_nocommute_gap
FAIL: iscan_nocommute_gap_in_place
FAIL: iscatter_gap

@hppritcha hppritcha self-assigned this Apr 30, 2021
hppritcha added a commit to hppritcha/ompi that referenced this issue Apr 30, 2021
PR open-mpi#8536 instroduced a regression in non-cuda environments
when an application is using derived, but continguous datatypes.

Related to open-mpi#8905.

Signed-off-by: Howard Pritchard <[email protected]>
hppritcha added a commit to hppritcha/ompi that referenced this issue Apr 30, 2021
PR open-mpi#8536 introduced a regression in non-cuda environments
when an application is using derived, but continguous datatypes.

Related to open-mpi#8905.

Signed-off-by: Howard Pritchard <[email protected]>
hppritcha added a commit to hppritcha/ompi that referenced this issue May 3, 2021
PR open-mpi#8536 introduced a regression in non-cuda environments
when an application is using derived, but continguous datatypes.

Related to open-mpi#8905.

Signed-off-by: Howard Pritchard <[email protected]>
@hppritcha
Copy link
Member Author

noticed the CUDA cm pml changes were pushed back to 4.1.x so add that label too.

hppritcha added a commit to hppritcha/ompi that referenced this issue May 4, 2021
PR open-mpi#8536 introduced a regression in non-cuda environments
when an application is using derived, but continguous datatypes.

Related to open-mpi#8905.

Signed-off-by: Howard Pritchard <[email protected]>
(cherry picked from commit 9e99182)
@hppritcha
Copy link
Member Author

well turns out the mod that broke master and v5.0.x isn't in 4.1.x so removing that label.

@hppritcha
Copy link
Member Author

closed via #8906 and #8920

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant