Parallel MPI_Get on same window provide wrong data when using mtl/psm2 and osc/ucx

## Background information

### What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

I can reproduce this issue with version 4.1.1 <s>but _not_ with versions 3.1.4, 3.1.1 and 2.1.2. So this looks to me like a regression in OpenMPI 4</s>(see https://github.com/open-mpi/ompi/issues/10433#issuecomment-1140262603).

### Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

All mentioned versions were built using [EasyBuild](https://github.com/easybuilders).

### Please describe the system on which you are running

* Operating system/version: Red Hat Enterprise Linux release 8.3 (Ootpa)
* Computer hardware: Cray CS500, nodes contain two Intel(R) Xeon(R) Gold 6148(F) CPUs ([system description](https://pc2.uni-paderborn.de/hpc-services/available-systems/noctua1))
* Network type: 100Gbps Omni-Path

-----------------------------

## Details of the problem

I use one-sided communication to distribute parts of a 2000x2000 character array to the different MPI processes. For that, I use `MPI_Get` on each process to fetch its data block from rank 0. `MPI_Get` is surrounded by calls to `MPI_Win_fence` as follows:
```
MPI_Win_fence(0, win);
MPI_Get(local, local_height*local_width, MPI_CHAR, 0, global_offset, 1, mydata_in_global, win);
MPI_Win_fence(0, win);
```

This works correctly on most systems that I work on. On one specific compute cluster that uses the PSM2 MTL by default, `MPI_Get` will undeterministically provide wrong data.

A workaround is to make sure that only one `MPI_Get` call is performed within each communication epoch:
```
for (int i = 0; i < nranks; i++) {
  MPI_Win_fence(0, win);
  if (rank == i) MPI_Get(local, local_height*local_width, MPI_CHAR, 0, global_offset, 1, mydata_in_global, win);
}
MPI_Win_fence(0, win);
```

The problem also goes away when the use of PSM2 is avoided by using either of the following environment variables:
* `OMPI_MCA_mtl=^psm2`
* `OMPI_MCA_pml=ob1`

The problem only shows up for a significant number of processes (at least 10, better 20) and only at some of many repeated program runs. So far, I have only tested on a single node, i.e. communication probably does not go over the network.

Semantics and correctness of one-sided communication is described in section 12.7 of the [current MPI standard](https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf). It is described that parallel `MPI_Put`s on the same windows within the same epoch are undefined behavior. However, for `MPI_Get` I could not find such a statement and it seems counter-intuitive that multiple read accesses influence each other.

Apart from OpenMPI v2, v3 and v4, I also tested Intel MPI and it runs the program without issues.

You can find a code that reproduces the problem in the following gist: https://gist.github.com/michaellass/85486202494a149c9f24f48ad1786497
I run it via the following script which creates 50 output files that _should_ all be identical:
```
for i in $(seq 50); do
  mpirun -np 20 ./reproducer $i.out
done

echo
echo "All runs should create the same checksum:"

sha1sum *.out
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallel MPI_Get on same window provide wrong data when using mtl/psm2 and osc/ucx #10433

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Details of the problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parallel MPI_Get on same window provide wrong data when using mtl/psm2 and osc/ucx #10433

Description

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Details of the problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions