-
Notifications
You must be signed in to change notification settings - Fork 910
Closed
Description
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
Reproduced with v5.0.x branch and main. Works in v4.x and 3.x.
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Build:
ompi_info
Package: Open MPI huettig@alkione Distribution
Open MPI: 5.1.0a1
Open MPI repo revision: v2.x-dev-10918-g9f82d1cfcb
Open MPI release date: Unreleased developer copy
MPI API: 3.1.0
Ident string: 5.1.0a1
Prefix: /plp_scr1/utils/mpi_test/ompi_build
Configured architecture: x86_64-pc-linux-gnu
Configured by: huettig
Configured on: Mon Sep 4 14:04:42 UTC 2023
Configure host: alkione
Configure command line: '--prefix=/plp_scr1/utils/mpi_test/ompi_build'
'--with-cuda=/plp_scr1/utils/cuda'
'--with-flex=/flex'
'--with-ucx=/plp_scr1/utils/mpi_test/ucx_build'
'--enable-mca-no-build=btl-uct'
'--disable-man-pages'
Built by: huettig
Built on: Mo 4. Sep 14:16:36 UTC 2023
Built host: alkione
C bindings: yes
Fort mpif.h: yes (all)
Fort use mpi: yes (full: ignore TKR)
Fort use mpi size: deprecated-ompi-info-value
Fort use mpi_f08: yes
Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
limitations in the gfortran compiler and/or Open
MPI, does not support the following: array
subsections, direct passthru (where possible) to
underlying Open MPI's C functionality
Fort mpi_f08 subarrays: no
Java bindings: no
Wrapper compiler rpath: runpath
C compiler: gcc
C compiler absolute: /bin/gcc
C compiler family name: GNU
C compiler version: 13.2.0
C++ compiler: g++
C++ compiler absolute: /bin/g++
Fort compiler: gfortran
Fort compiler abs: /bin/gfortran
Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
Fort 08 assumed shape: yes
Fort optional args: yes
Fort INTERFACE: yes
Fort ISO_FORTRAN_ENV: yes
Fort STORAGE_SIZE: yes
Fort BIND(C) (all): yes
Fort ISO_C_BINDING: yes
Fort SUBROUTINE BIND(C): yes
Fort TYPE,BIND(C): yes
Fort T,BIND(C,name="a"): yes
Fort PRIVATE: yes
Fort ABSTRACT: yes
Fort ASYNCHRONOUS: yes
Fort PROCEDURE: yes
Fort USE...ONLY: yes
Fort C_FUNLOC: yes
Fort f08 using wrappers: yes
Fort MPI_SIZEOF: yes
C profiling: yes
Fort mpif.h profiling: yes
Fort use mpi profiling: yes
Fort use mpi_f08 prof: yes
Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
OMPI progress: no, Event lib: yes)
Sparse Groups: no
Internal debug support: no
MPI interface warnings: yes
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
dl support: yes
Heterogeneous support: no
MPI_WTIME support: native
Symbol vis. support: yes
Host topology support: yes
IPv6 support: no
MPI extensions: affinity, cuda, ftmpi, rocm, shortfloat
Fault Tolerance support: yes
FT MPI support: yes
MPI_MAX_PROCESSOR_NAME: 256
MPI_MAX_ERROR_STRING: 256
MPI_MAX_OBJECT_NAME: 64
MPI_MAX_INFO_KEY: 36
MPI_MAX_INFO_VAL: 256
MPI_MAX_PORT_NAME: 1024
MPI_MAX_DATAREP_STRING: 128
MCA accelerator: null (MCA v2.1.0, API v1.0.0, Component v5.1.0)
MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA btl: self (MCA v2.1.0, API v3.3.0, Component v5.1.0)
MCA btl: sm (MCA v2.1.0, API v3.3.0, Component v5.1.0)
MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.1.0)
MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v5.1.0)
MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
v5.1.0)
MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
v5.1.0)
MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA mpool: hugepage (MCA v2.1.0, API v3.1.0, Component v5.1.0)
MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
v5.1.0)
MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v5.1.0)
MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA smsc: cma (MCA v2.1.0, API v1.0.0, Component v5.1.0)
MCA threads: pthreads (MCA v2.1.0, API v1.0.0, Component v5.1.0)
MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA bml: r2 (MCA v2.1.0, API v2.1.0, Component v5.1.0)
MCA coll: adapt (MCA v2.1.0, API v2.4.0, Component v5.1.0)
MCA coll: basic (MCA v2.1.0, API v2.4.0, Component v5.1.0)
MCA coll: cuda (MCA v2.1.0, API v2.4.0, Component v5.1.0)
MCA coll: han (MCA v2.1.0, API v2.4.0, Component v5.1.0)
MCA coll: inter (MCA v2.1.0, API v2.4.0, Component v5.1.0)
MCA coll: libnbc (MCA v2.1.0, API v2.4.0, Component v5.1.0)
MCA coll: self (MCA v2.1.0, API v2.4.0, Component v5.1.0)
MCA coll: sync (MCA v2.1.0, API v2.4.0, Component v5.1.0)
MCA coll: tuned (MCA v2.1.0, API v2.4.0, Component v5.1.0)
MCA coll: ftagree (MCA v2.1.0, API v2.4.0, Component v5.1.0)
MCA coll: monitoring (MCA v2.1.0, API v2.4.0, Component
v5.1.0)
MCA coll: sm (MCA v2.1.0, API v2.4.0, Component v5.1.0)
MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
v5.1.0)
MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
v5.1.0)
MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component
v5.1.0)
MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA io: romio341 (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA op: avx (MCA v2.1.0, API v1.0.0, Component v5.1.0)
MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v5.1.0)
MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
v5.1.0)
MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v5.1.0)
MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v5.1.0)
MCA part: persist (MCA v2.1.0, API v4.0.0, Component v5.1.0)
MCA pml: cm (MCA v2.1.0, API v2.1.0, Component v5.1.0)
MCA pml: monitoring (MCA v2.1.0, API v2.1.0, Component
v5.1.0)
MCA pml: ob1 (MCA v2.1.0, API v2.1.0, Component v5.1.0)
MCA pml: ucx (MCA v2.1.0, API v2.1.0, Component v5.1.0)
MCA pml: v (MCA v2.1.0, API v2.1.0, Component v5.1.0)
MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
v5.1.0)
MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
v5.1.0)
MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v5.1.0)
MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v5.1.0)
MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
v5.1.0)
MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
v5.1.0)
Subs:
22fe51cb7a961b6060fc5c48e659237cbe162566 3rd-party/openpmix (v1.1.3-3872-g22fe51cb)
ece4f3c45a07a069e5b8f9c5e641613dfcaeffc3 3rd-party/prrte (psrvr-v2.0.0rc1-4638-gece4f3c45a)
c1cfc910d92af43f8c27807a9a84c9c13f4fbc65 config/oac (heads/main)
Please describe the system on which you are running
- Operating system/version: RedHat 8
- Computer hardware: AMD x64
- Network type: Infiniband
Details of the problem
Our simulation uses two big indexed file types, one for scalar, one for vector. There are write and read operations for multiple scalars/vectors in a single file. Here are some details:
- WRITE on the same file type works, verified.
- READ reads corrupted data after using the vector mask and switching back to scalar. (heavily indexed, varying block/displacement sizes, all doubles)
- Verified correct offsets, no stack/heap corruption.
- OMPI versions < 5 work
- Not file system dependent, tested on NFS4 and locally mounted FS. Same result.
- No MPI error is returned
- The corrupted data seems partially leaked from previous reads (just judging from the double values)
- Running with one task and just one big block and zero displacement works
Sorry I cannot provide a demo code. However I can relatively easy test / recompile this scenario from a different branch / version.
Best,
Christian