Skip to content

Implement cftime vectorization as discussed in PR #8322 #8324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 30, 2025

Conversation

antscloud
Copy link
Contributor

As discussed in #8322, here is the test for implementing the vectorization

Only this test seems to fail in test_coding_times.py :

@requires_cftime
@pytest.mark.parametrize("freq", ["U", "L", "S", "T", "H", "D"])
def test_encode_decode_roundtrip_cftime(freq) -> None:
initial_time = cftime_range("0001", periods=1)
times = initial_time.append(
cftime_range("0001", periods=2, freq=freq) + timedelta(days=291000 * 365)
)
variable = Variable(["time"], times)
encoded = conventions.encode_cf_variable(variable)
decoded = conventions.decode_cf_variable("time", encoded, use_cftime=True)
assert_equal(variable, decoded)

I don't really understand why though if you have an idea

@dcherian dcherian changed the title Implement vectorization as discussed in PR #8322 Implement cftime vectorization as discussed in PR #8322 Oct 17, 2023
@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label Oct 17, 2023
Copy link
Contributor

@kmuehlbauer kmuehlbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antscloud Nice catch! I have two suggestion to get the test suite running.

@headtr1ck
Copy link
Collaborator

@antscloud are you willing to continue this? It would be a good addition.

@antscloud antscloud force-pushed the vectorization_of_encode_cftime branch from cff06d4 to f28fc7d Compare July 14, 2024 15:27
@antscloud
Copy link
Contributor Author

Hi, sorry i completely forgot about this issue, i've implemented the suggestions and all the tests appear to pass, I hope it works in the CI 🤞

@headtr1ck
Copy link
Collaborator

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

I didn't check in detail if we have a test that might cover this already and if yes, what the changes are.

@spencerkclark
Copy link
Member

I don't think we currently do, but I think this new benchmark should cover it: #9262.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @antscloud! I think this looks good to me now. Could you add a what's new entry under a new Performance heading?

@headtr1ck
Copy link
Collaborator

headtr1ck commented Jul 20, 2024

| Change | Before [91a52dc] | After [3512bee] | Ratio | Benchmark (Parameter) |
|----------|----------------------|---------------------|---------|-----------------------------------------------------------|
| - | 139±0.9ms | 42.3±0.7ms | 0.3 | coding.EncodeCFDatetime.time_encode_cf_datetime('noleap') |

3 times faster, not bad

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

dates = np.atleast_1d(dates)

# Find all the None position
none_position = np.equal(dates, None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy complains here.
Not sure if this is an issue of wrong typing of numpy, or if this is not a recommended way of calling equals. (I think the None is the problem).

@headtr1ck
Copy link
Collaborator

headtr1ck commented Oct 14, 2024

The mypy error is still there, not sure how to effectively fix it, maybe use kwargs instead or simply type ignore it.

Otherwise this looks good and should get an entry in whats-new!

@kmuehlbauer
Copy link
Contributor

Thanks @antscloud! Still a valuable addition, which should be mentioned in whats-new.rst. @antscloud are you still around to finalize this? Otherwise I could take care of that.

@kmuehlbauer
Copy link
Contributor

Great, either I messed up the merge or this fails now, because of other concurring changes in the past time. @spencerkclark Do you spot anything suspicious here?

@spencerkclark
Copy link
Member

Thanks @kmuehlbauer for reviving this—I think the test failures are a manifestation of the cftime issue you found earlier (Unidata/cftime#354). The difference here from what was implemented in #9618 is that datetime.datetime objects are no longer converted to cftime.DatetimeProlepticGregorian objects prior to being passed to cftime.date2num.

While the issue was fixed in Unidata/cftime#355, a new version of cftime has not yet been released. We could lobby for that, or raise an error in these circumstances. Independent of the cftime issue, there may be a reasonable case for raising (#10352).

@kmuehlbauer
Copy link
Contributor

@spencerkclark Looks like this works after merging #10352. 🎉

@dcherian dcherian requested a review from spencerkclark May 27, 2025 17:34
Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @antscloud! Sorry this took so long to push through. I went ahead and added a what's new entry on your behalf. Feel free to edit as you see fit, but otherwise I will merge on Thursday.

@spencerkclark spencerkclark requested a review from kmuehlbauer May 27, 2025 22:23
@spencerkclark
Copy link
Member

@kmuehlbauer I'm guessing you approve now, but giving you the chance to take another look and remove your requested change (which I think is resolved).

@spencerkclark spencerkclark added the plan to merge Final call for comments label May 28, 2025
@spencerkclark
Copy link
Member

The benchmark workflow failure is unrelated, so I am going to go ahead and merge. Thanks again @antscloud!

@spencerkclark spencerkclark merged commit 8b961e4 into pydata:main May 30, 2025
31 of 32 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request May 30, 2025
* main:
  Fix performance regression in interp from pydata#9881 (pydata#10370)
  html repr: improve style for dropdown sections (pydata#10354)
  Grouper tweaks. (pydata#10362)
  Docs: Add links to getting help mermaid diagram (pydata#10324)
  Enforce ruff/flynt rules (FLY) (pydata#10375)
  Add missing AbstractWritableDataStore base methods and arguments (pydata#10343)
  Improve html repr in dark mode (Jupyterlab + Xarray docs) (pydata#10353)
  Pin Mypy to 1.15 (pydata#10378)
  use numpy dtype exposed by zarr array instead of metadata.data_type (pydata#10348)
  Fix doc typo for caption "Interoperability" (pydata#10374)
  Implement cftime vectorization as discussed in PR pydata#8322 (pydata#8324)
  Enforce ruff/flake8-pyi rules (PYI) (pydata#10359)
  Apply assorted ruff/Pylint rules (PL) / Enforce PLE rules (pydata#10366)
  (fix): pandas extension array repr for int64[pyarrow] (pydata#10317)
  Enforce ruff/flake8-implicit-str-concat rules (ISC) (pydata#10368)
  Enforce ruff/refurb rules (FURB) (pydata#10367)
  Ignore ruff/Pyflakes rule F401 more precisely (pydata#10369)
  Apply assorted ruff/flake8-simplify rules (SIM) (pydata#10364)
  Apply assorted ruff/flake8-pytest-style rules (PT) (pydata#10363)
  Fix "a array" misspelling (pydata#10365)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments run-benchmark Run the ASV benchmark workflow topic-cftime topic-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants