Skip to content

Design Document: Generalize Calendar supported by Analysis #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Feb 9, 2017

This design doc describes the requirements, design, implementation and testing of an alternative approach for the time dimension of xarray data sets in MPAS-Analysis that can support either MPAS calendar ('gregorian' or 'gregorian_noleap').

the Gregorain ('gregorian') and the 365-day ('gregorian_noleap') calendars. It also needs to
support, at a minimum, years between 0001 and 9999, and preferably arbitrary
years both positive and negative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great description of problem

date: 2017/02/09 <br>
</h1>
<h2> Summary </h2>
Currently, the time variable in `xarray` data sets within MPAS-Analysis have two
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have => has


A major challenge is that it seems that xarray cannot easily be forced to
use an alternative representation of dates to the troublesome
`numpy.datetime64` type. The most obvious alternative, `datetime.datetime`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xarray issue references here would be good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware of pydata/xarray#1084. What else?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mainly speaking here about my own testing, rather than the xarray issues, but I guess it doesn't hurt to point to issues as well.

A major challenge is that it seems that xarray cannot easily be forced to
use an alternative representation of dates to the troublesome
`numpy.datetime64` type. The most obvious alternative, `datetime.datetime`,
seemingly cannot be useddirectly in `xarray` because objects of this type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useddirectly => used directly

The solution proposed herein is to store time as floating-point days since the
reference date 0001-01-01 and to convert dates in this format to
`datetime.datetime` and `MpasRelativeDelta` objects whenever mathematical
manipulation of dates is required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reasonable because the smallest unit of time needing to be accounted for is ms. Note, for clarity indicate that

>>> import sys
>>> sys.float_info.epsilon*24*60*60*1000
1.9184653865522705e-08
>>> sys.float_info.max/365.
4.925186670855659e+305

which indicates we can easily accommodate ms as the finest unit of time as well as long time periods corresponding to paleoclimate timescales. Note, we cannot support ns, but for climate scales we presently have no use case or notion of a use case where we need ns support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, it would be great if you could add this discussion to the design document because it justifies this choice and indicates why it is reasonable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that MPAS supports time intervals shorter than a second, but I could be wrong. I don't think we even need millisecond accuracy. But I'll include your argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think sys.float_info.max is useful for this purpose. It would be important to know how large a float needs to be before the steps between consecutive representable values are larger than seconds, and then divide that by 365 to get years. I don't really know how to do that. But it's going to be many, many orders of magnitude larger than anything we would ever want so it's not really worth worrying about.

on the appropriate MPAS calendar, either 'gregorain' or 'gregorain_noleap', depending
on the namelist option 'config_calendar_type'. There must be ways of mathematically
manipulating times (e.g. adding/subtracting offsets and figuring out the amount of time
between two dates) and of making plots that are consistent with these calendars.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, if time is just a float of days I think we should not run into too many problems. In the worst case, units from days would have to be converted outside of xarray into s. The only real time we need a true date would be for input or output from the daysSinceReferenceDay time coordinate, right? Otherwise, all operations would be on actual float days and math should be easy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for plots I think this should be relatively straight-forward in terms of just having some CF compliant notion of how to do labeling, e.g., 'days since Reference time of 0001-01-01'. There is probably a way to have xarray inject this title automatically on produced plots.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram, regarding the first issue, there are actually lots of times we want to work with dates, which is why I've put so much work into functions to convert back and forth. We do math on years and months, for example, and that can't be done with the day formatting. But converting back and forth isn't too bad, just a little clunky to have an extra step.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the second issue, there's a function plt.plot_date that takes exactly the format I'm suggesting here: days since 0001-01-01. It doesn't support other reference dates, so that was one reason I chose that reference.


For all data sets used in the analysis, the 'Time' coordinate must, at a minimum,
support years between 0001 and 9999 (the range of `datetime.datetim`) and preferably
a broader range.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implied solution above in the summary section does this cleanly, but perhaps the implied solution is in the wrong place although I'd lean towards having in the implementation section and the summary section because it is such a key idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided the same. I don't think there's a problem with giving away a bit about the intended solution in the summary and the details later. I'd like to keep it where it is (both places).


The conversion operations will be performed with the calendar-aware functions
`netCDF4.date2num` and `netCDF4.num2date` and will support lists/arrays of dates
(for efficiency and simplicity of calling code) in addition to single values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this solution has now changed and is no longer applicable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is still the solution. These function are just wrapped in the datetimeToDays and daysToDatetime functions for simplicity. I will clarify.



<h1> Design and Implementation </h1>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, I'm not clear exactly where we are in terms of design. Is the following below still applicable and needing review? Perhaps a call would be helpful here- I'm around via phone.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I've got an implementation already that needs cleaning up based on various other PRs in progress. The branch exists if you want to take a look but I haven't make a PR because it's not ready for review yet.

@milenaveneziani
Copy link
Collaborator

@xylar: just so that I understand, is this design document intended for a last PR to better handle the time/calendar?

@xylar
Copy link
Collaborator Author

xylar commented Feb 10, 2017

@milenaveneziani, no, this is for a PR yet to come. It will build on top of #104, so that's the next priority.

In the PR you just did (#102), there was some reorganizing to support a general calendar but nothing yet to address getting rid of yearOffset or using that more general calendar in our xarray data sets.

I'm doing this in a lot of steps so each step is small enough to make sense on its own and isn't too overwhelming. It's a lot more work this way but should avoid mistakes and make sure everyone is on board.

@milenaveneziani
Copy link
Collaborator

@xylar: yes, that's great. Do you think this will be the last PR to fix the problem (not trying to push here, I just want to make sure I understand. and it's also nice to see the finish line, especially for you I am sure :)

</h2>

For all data sets used in the analysis, the 'Time' coordinate must represent dates
on the appropriate MPAS calendar, either 'gregorain' or 'gregorain_noleap', depending
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor typos on 'gregorian'

</h2>

For all data sets used in the analysis, the 'Time' coordinate must, at a minimum,
support years between 0001 and 9999 (the range of `datetime.datetim`) and preferably
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other tiny typo on datetime

@milenaveneziani
Copy link
Collaborator

@xylar: I don't think I have anything to add to this document. Looks great to me.

@xylar
Copy link
Collaborator Author

xylar commented Feb 11, 2017

We're getting there, but I'd like to merge #104 and #108 before I make a PR for the branch implementing this design.

```
**This violates the first requirement in the
[MpasXarray class design doc](https://github.com/xylar/MPAS-Analysis/blob/design_doc_variable_mapping_reorg/design_docs/mpas_xarray_class.md).
I am open to alternative solutions for keeping `mpas_xarray` separate from the rest
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram, I'm concerned about this issue and I think you will be, too. Please let me know what you think and if you have a clean solution. If a phone call would be best, let me know when you'll be free.

Copy link
Contributor

@pwolfram pwolfram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we need. One potential issue- we are at the point where it may no longer make sense to keep mpas_xarray separate from MPAS-Analysis, at least in the short term. Hence, the deviation from the prior design decision is warranted given challenges with xarray and its time coordinate. Hence, I think we need to corporately be ok with loosing independence between mpas_xarray and MPAS-Analysis, at least in the short term.

problem is exacerbated by the fact that there are analysis-specific functions in
`timekeeping`, meaning that this cannot easily be made a submodule of `mpas_xarray`
(nor would this make very much logical sense). Having 2 `timekeeping` modules, one
for `mpas_xarray` and one for MPAS-Analysis, seems unnecessarily confunsing.**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, why do we need to have the utilities associated with mpas_xarray in the timekeeping.utility file? As long as there are not a large number of these it is ok to include them in mpas_xarray- this is the simplest solution and resolution to the problem.

The solution of having two timekeeping modules will be confusing long term and I vote against it.

We are at the point, e.g., timekeeping in xarray isn't robust enough to handle MPAS needs, that it may be both impractical and unreasonable to keep MPAS-Analysis and mpas_xarray separate. The whole point of mpas_xarray is to provide an xarray-compatable dataset that can leverages dask for performance. We can't fully do this without handling the troubling time issue, which admittedly is even a fix for the MPAS time datastructures. Hence, we are at the point where keeping mpas_xarray and MPAS-Analysis separate makes less sense and potentially may be confusing in the long term too. This being said, once time is fixed in xarray, we should consider doing a complete refractor (at least in terms of xarray interfacing).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for giving this some thought.

why do we need to have the utilities associated with mpas_xarray in the timekeeping.utility file? As long as there are not a large number of these it is ok to include them in mpas_xarray- this is the simplest solution and resolution to the problem.

These utilities are not only needed in mpas_xarray. They are also used (fairly extensively) in the analysis scripts themselves and also in StreamsFile. In my view, a developer of MPAS-Analysis wouldn't find it very intuitive that some timekeeping functions are found in mpas_xarray while others are in timekeeping. Yet putting everything in timekeeping instead into mpas_xarray would also not be very intuitive -- these utilities clearly have to do with timekeeping but much less clearly to do with xarray.

I strongly prefer to keep things as they are.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my previous comments you'll note I'm at the point where I don't think a clean distinction between mpas_xarray and MPAS-Analysis is fully value-added like it used to be in the past. Hence, these types of design decisions are the best we can do and I'm in agreement with you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, glad we're on the same page.

This design changes the Time variable in xarray datasets
to be the number of days since 0001-01-01.
@xylar xylar force-pushed the design_doc_generalize_calendar branch from dc4fbea to 3793096 Compare February 22, 2017 15:32
@xylar xylar removed the in progress label Feb 22, 2017
@xylar
Copy link
Collaborator Author

xylar commented Feb 22, 2017

@milenaveneziani, I have squashed the commits and this PR is ready to merge.

@milenaveneziani milenaveneziani merged commit 3793096 into MPAS-Dev:develop Feb 22, 2017
@xylar xylar deleted the design_doc_generalize_calendar branch February 23, 2017 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants