Skip to content

Delaying open produces different type of cftime object #6026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dougiesquire opened this issue Nov 25, 2021 · 3 comments
Closed

Delaying open produces different type of cftime object #6026

dougiesquire opened this issue Nov 25, 2021 · 3 comments

Comments

@dougiesquire
Copy link

What happened:
The task is opening a dataset (e.g. a netcdf or zarr file) with a time coordinate using use_cftime=True. Delaying the task with dask results in the time coordinate being represented as cftime.datetime objects, whereas when the task is not delayed cftime.Datetime<Calendar> objects are used.

What you expected to happen:
Consistent cftime objects to be used, regardless of whether the opening task is delayed or not.

Minimal Complete Verifiable Example:

import dask
import numpy as np
import xarray as xr
from dask.distributed import LocalCluster, Client

cluster = LocalCluster()
client = Client(cluster)

# Write some data
var = np.random.random(4)
time = xr.cftime_range('2000-01-01', periods=4, calendar='julian')
ds = xr.Dataset(data_vars={'var': ('time', var)},
                coords={'time': time})
ds.to_netcdf('test.nc', mode='w')

# Open written data
ds1 = xr.open_dataset('test.nc', use_cftime=True)
print(f'ds1: {ds1.time} \n')

# Delayed open written data
ds2 = dask.delayed(xr.open_dataset)('test.nc', use_cftime=True)
ds2 = dask.compute(ds2)[0]
print(f'ds2: {ds2.time} \n')

# Operations like xr.open_mfdataset which use dask.delayed internally 
# when parallel=True (I think) produce the same result as ds2
ds3 = xr.open_mfdataset('test.nc', use_cftime=True, parallel=True)
print(f'ds3: {ds3.time}')

returns

ds1: <xarray.DataArray 'time' (time: 4)>
array([cftime.DatetimeJulian(2000, 1, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeJulian(2000, 1, 2, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeJulian(2000, 1, 3, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeJulian(2000, 1, 4, 0, 0, 0, 0, has_year_zero=False)],
      dtype=object)
Coordinates:
  * time     (time) object 2000-01-01 00:00:00 ... 2000-01-04 00:00:00 

ds2: <xarray.DataArray 'time' (time: 4)>
array([cftime.datetime(2000, 1, 1, 0, 0, 0, 0, calendar='julian', has_year_zero=False),
       cftime.datetime(2000, 1, 2, 0, 0, 0, 0, calendar='julian', has_year_zero=False),
       cftime.datetime(2000, 1, 3, 0, 0, 0, 0, calendar='julian', has_year_zero=False),
       cftime.datetime(2000, 1, 4, 0, 0, 0, 0, calendar='julian', has_year_zero=False)],
      dtype=object)
Coordinates:
  * time     (time) object 2000-01-01 00:00:00 ... 2000-01-04 00:00:00 

ds3: <xarray.DataArray 'time' (time: 4)>
array([cftime.datetime(2000, 1, 1, 0, 0, 0, 0, calendar='julian', has_year_zero=False),
       cftime.datetime(2000, 1, 2, 0, 0, 0, 0, calendar='julian', has_year_zero=False),
       cftime.datetime(2000, 1, 3, 0, 0, 0, 0, calendar='julian', has_year_zero=False),
       cftime.datetime(2000, 1, 4, 0, 0, 0, 0, calendar='julian', has_year_zero=False)],
      dtype=object)
Coordinates:
  * time     (time) object 2000-01-01 00:00:00 ... 2000-01-04 00:00:00

Anything else we need to know?:
I noticed this because the DatetimeAccessor ceil, floor and round methods return errors for cftime.datetime objects (but not cftime.Datetime<Calendar> objects) for all calendar types other than 'gregorian'. For example,

ds3.time.dt.floor('D')

returns the following traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-613e63624953> in <module>
----> 1 ds3.time.dt.floor('D')

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/accessor_dt.py in floor(self, freq)
    220         """
    221 
--> 222         return self._tslib_round_accessor("floor", freq)
    223 
    224     def ceil(self, freq):

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/accessor_dt.py in _tslib_round_accessor(self, name, freq)
    202     def _tslib_round_accessor(self, name, freq):
    203         obj_type = type(self._obj)
--> 204         result = _round_field(self._obj.data, name, freq)
    205         return obj_type(result, name=name, coords=self._obj.coords, dims=self._obj.dims)
    206 

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/accessor_dt.py in _round_field(values, name, freq)
    142         )
    143     else:
--> 144         return _round_through_series_or_index(values, name, freq)
    145 
    146 

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/accessor_dt.py in _round_through_series_or_index(values, name, freq)
    110         method = getattr(values_as_cftimeindex, name)
    111 
--> 112     field_values = method(freq=freq).values
    113 
    114     return field_values.reshape(values.shape)

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in floor(self, freq)
    733         CFTimeIndex
    734         """
--> 735         return self._round_via_method(freq, _floor_int)
    736 
    737     def ceil(self, freq):

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in _round_via_method(self, freq, method)
    714 
    715         unit = _total_microseconds(offset.as_timedelta())
--> 716         values = self.asi8
    717         rounded = method(values, unit)
    718         return _cftimeindex_from_i8(rounded, self.date_type, self.name)

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in asi8(self)
    684         epoch = self.date_type(1970, 1, 1)
    685         return np.array(
--> 686             [
    687                 _total_microseconds(exact_cftime_datetime_difference(epoch, date))
    688                 for date in self.values

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in <listcomp>(.0)
    685         return np.array(
    686             [
--> 687                 _total_microseconds(exact_cftime_datetime_difference(epoch, date))
    688                 for date in self.values
    689             ],

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/resample_cftime.py in exact_cftime_datetime_difference(a, b)
    356     datetime.timedelta
    357     """
--> 358     seconds = b.replace(microsecond=0) - a.replace(microsecond=0)
    359     seconds = int(round(seconds.total_seconds()))
    360     microseconds = b.microsecond - a.microsecond

src/cftime/_cftime.pyx in cftime._cftime.datetime.__sub__()

TypeError: cannot compute the time difference between dates with different calendars

My apologies for conflating two issues here. I'm happy to open a separate issue for this if that's preferred.

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.9.4 | packaged by conda-forge | (default, May 10 2021, 22:13:33)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-305.19.1.el8.nci.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.20.1
pandas: 1.3.4
numpy: 1.21.4
scipy: 1.6.3
netCDF4: 1.5.6
pydap: None
h5netcdf: 0.11.0
h5py: 3.3.0
Nio: None
zarr: 2.9.5
cftime: 1.5.0
nc_time_axis: 1.4.0
PseudoNetCDF: None
rasterio: 1.2.4
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.11.2
distributed: 2021.11.2
matplotlib: 3.4.2
cartopy: 0.19.0.post1
seaborn: None
numbagg: None
fsspec: 2021.05.0
cupy: None
pint: 0.18
sparse: None
setuptools: 49.6.0.post20210108
pip: 21.1.2
conda: 4.10.1
pytest: None
IPython: 7.24.0
sphinx: None

@spencerkclark
Copy link
Member

Sorry for not responding to this issue earlier -- I think this is related to #5686 (see discussion and links there for more details). I can reproduce your issue with cftime version 1.5.0, and I tested things with cftime version 1.5.1 and it was fixed (i.e. cftime.DatetimeJulian objects are returned in all cases).

@dougiesquire
Copy link
Author

Thanks @spencerkclark! Updating to cftime version 1.5.1 fixes the issue.

@spencerkclark
Copy link
Member

Great! I'll go ahead and close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants