-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
needs triageIssue that has not been reviewed by xarray team memberIssue that has not been reviewed by xarray team member
Description
What is your issue?
I try to open about ~10 files, each 5MB as a test case, using xarray
's open_mfdataset
method with the parallel=True
option, however, it throws a "Segmentation fault" error as the following:
$ ipython
Python 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import xarray as xr
In [2]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10})
In [3]: ds
Out[3]:
<xarray.Dataset>
Dimensions: (time: 744, rlat: 140, rlon: 105)
Coordinates:
* time (time) datetime64[ns] 1980-01-01T13:00:00 ... 1980-0...
lon (rlat, rlon) float32 dask.array<chunksize=(140, 105), meta=np.ndarray>
lat (rlat, rlon) float32 dask.array<chunksize=(140, 105), meta=np.ndarray>
* rlon (rlon) float64 342.1 342.2 342.2 ... 351.2 351.3 351.4
* rlat (rlat) float64 -7.83 -7.74 -7.65 ... 4.5 4.59 4.68
Data variables:
rotated_pole (time) int32 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
RDRS_v2.1_P_UVC_10m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray>
RDRS_v2.1_P_FI_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray>
RDRS_v2.1_P_FB_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray>
RDRS_v2.1_A_PR0_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray>
RDRS_v2.1_P_P0_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray>
RDRS_v2.1_P_TT_1.5m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray>
RDRS_v2.1_P_HU_1.5m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray>
Attributes:
CDI: Climate Data Interface version 2.0.4 (https://mpimet.mpg.de...
Conventions: CF-1.6
product: RDRS_v2.1
Remarks: Variable names are following the convention <Product>_<Type...
License: These data are provided by the Canadian Surface Prediction ...
history: Mon Aug 28 13:44:02 2023: cdo -z zip -s -L -sellonlatbox,-1...
NCO: netCDF Operators version 5.0.6 (Homepage = http://nco.sf.ne...
CDO: Climate Data Operators version 2.0.4 (https://mpimet.mpg.de...
In [4]: type(ds)
Out[4]: xarray.core.dataset.Dataset
In [5]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10}, parallel=True)
[gra-login3:25527:0:6913] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8)
[gra-login3:25527] *** Process received signal ***
[gra-login3:25527] Signal: Segmentation fault (11)
[gra-login3:25527] Signal code: (128)
[gra-login3:25527] Failing at address: (nil)
Segmentation fault
Here is the version of xarray
:
In [5]: xr.show_versions()
/home/user/virtual-envs/scienv/lib/python3.10/site-packages/_distutils_hack/__init__.py:36: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.88.1.el7.x86_64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: ('en_CA', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.9.0
xarray: 2023.7.0
pandas: 1.4.0
numpy: 1.21.2
scipy: 1.8.0
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.8.0
distributed: 2023.8.0
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 60.2.0
pip: 23.2.1
conda: None
pytest: 7.4.0
mypy: None
IPython: 8.10.0
sphinx: None
I'm working on an HPC, so if a list "modules" I have loaded helps, here it is:
$ module list
Currently Loaded Modules:
1) CCconfig 5) gcccore/.9.3.0 (H) 9) libfabric/1.10.1 13) ipykernel/2023a 17) sqlite/3.38.5 21) postgresql/12.4 (t) 25) gdal/3.5.1 (geo) 29) udunits/2.2.28 (t) 33) cdo/2.2.1 (geo)
2) gentoo/2020 (S) 6) imkl/2020.1.217 (math) 10) openmpi/4.0.3 (m) 14) scipy-stack/2023a (math) 18) jasper/2.0.16 (vis) 22) freexl/1.0.5 (t) 26) geos/3.10.2 (geo) 30) libaec/1.0.6 34) mpi4py/3.1.3 (t)
3) StdEnv/2020 (S) 7) gcc/9.3.0 (t) 11) libffi/3.3 15) hdf5/1.10.6 (io) 19) libgeotiff-proj901/1.7.1 23) librttopo-proj9/1.1.0 27) proj/9.0.1 (geo) 31) eccodes/2.25.0 (geo) 35) netcdf-fortran/4.5.2 (io)
4) mii/1.1.2 8) ucx/1.8.0 12) python/3.10.2 (t) 16) netcdf/4.7.4 (io) 20) cfitsio/4.1.0 (vis) 24) libspatialite-proj901/5.0.1 28) expat/2.4.1 (t) 32) yaxt/0.9.0 (t) 36) libspatialindex/1.8.5 (phys)
Where:
S: Module is Sticky, requires --force to unload or purge
m: MPI implementations / Implémentations MPI
math: Mathematical libraries / Bibliothèques mathématiques
io: Input/output software / Logiciel d'écriture/lecture
t: Tools for development / Outils de développement
vis: Visualisation software / Logiciels de visualisation
geo: Geography libraries/apps / Logiciels de géographie
phys: Physics libraries/apps / Logiciels de physique
H: Hidden Module
Thanks.
Metadata
Metadata
Assignees
Labels
needs triageIssue that has not been reviewed by xarray team memberIssue that has not been reviewed by xarray team member