@@ -58,24 +58,34 @@ included in the vX.Y.Z section and be denoted as:
58
58
(** also appeared: A.B.C) -- indicating that this item was previously
59
59
included in release version vA.B.C.
60
60
61
- 4.1.1 -- February , 2021
61
+ 4.1.1 -- March , 2021
62
62
-----------------------
63
63
64
64
- Reverted temporary solution that worked around launch issues in
65
65
SLURM v20.11.{0,1,2}. SchedMD encourages users to avoid these
66
66
versions and to upgrade to v20.11.3 or newer.
67
+ - Updated PMIx to v3.2.2.
67
68
- Fixed configuration issue on Apple Silicon observed with
68
69
Homebrew. Thanks to François-Xavier Coudert for reporting the issue.
69
70
- Disabled gcc built-in atomics by default on aarch64 platforms.
71
+ - Disabled UCX PML when UCX v1.8.0 is detected. UCX version 1.8.0 has a bug that
72
+ may cause data corruption when its TCP transport is used in conjunction with
73
+ the shared memory transport. UCX versions prior to v1.8.0 are not affected by
74
+ this issue. Thanks to @ksiazekm for reporting the issue.
75
+ - Fixed detection of available UCX transports/devices to better inform PML
76
+ prioritization.
70
77
- Fixed SLURM support to mark ORTE daemons as non-MPI tasks.
71
78
- Improved AVX detection to more accurately detect supported
72
79
platforms. Also improved the generated AVX code, and switched to
73
80
using word-based MCA params for the op/avx component (vs. numeric
74
81
big flags).
75
82
- Improved OFI compatibility support and fixed memory leaks in error
76
83
handling paths.
84
+ - Improved HAN collectives with support for Barrier and Scatter. Thanks
85
+ to @EmmanuelBRELLE for these changes and the relevant bug fixes.
77
86
- Fixed MPI debugger support (i.e., the MPIR_Breakpoint() symbol).
78
87
Thanks to @louisespellacy-arm for reporting the issue.
88
+ - Fixed ORTE bug that prevented debuggers from reading MPIR_Proctable.
79
89
- Removed PML uniformity check from the UCX PML to address performance
80
90
regression.
81
91
- Fixed MPI_Init_thread(3) statement about C++ binding and update
@@ -94,6 +104,19 @@ included in the vX.Y.Z section and be denoted as:
94
104
- Fixed bug to marked a generalized request as pending once initiated.
95
105
- Fixed external PMIx v4.x check.
96
106
- Fixed OSHMEM build with `--enable-mem-debug`.
107
+ - Fixed a performance regression observed with older versions of GCC when
108
+ __ATOMIC_SEQ_CST is used. Thanks to @BiplabRaut for reporting the issue.
109
+ - Fixed buffer allocation bug in the binomial tree scatter algorithm when
110
+ non-contiguous datatypes are used. Thanks to @sadcat11 for reporting the issue.
111
+ - Fixed bugs related to the accumulate and atomics functionality in the
112
+ osc/rdma component.
113
+ - Fixed race condition in MPI group operations observed with
114
+ MPI_THREAD_MULTIPLE threading level.
115
+ - Fixed a deadlock in the TCP BTL's connection matching logic.
116
+ - Fixed pml/ob1 compilation error when CUDA support is enabled.
117
+ - Fixed a build issue with Lustre caused by unnecessary header includes.
118
+ - Fixed a build issue with IMB LSF workload manager.
119
+ - Fixed linker error with UCX SPML.
97
120
98
121
99
122
4.1.0 -- December, 2020
0 commit comments