-
Notifications
You must be signed in to change notification settings - Fork 52
Design Document: Generalize Calendar supported by Analysis #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design Document: Generalize Calendar supported by Analysis #105
Conversation
design_docs/generalize_calendar.md
Outdated
the Gregorain ('gregorian') and the 365-day ('gregorian_noleap') calendars. It also needs to | ||
support, at a minimum, years between 0001 and 9999, and preferably arbitrary | ||
years both positive and negative. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great description of problem
design_docs/generalize_calendar.md
Outdated
date: 2017/02/09 <br> | ||
</h1> | ||
<h2> Summary </h2> | ||
Currently, the time variable in `xarray` data sets within MPAS-Analysis have two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have => has
design_docs/generalize_calendar.md
Outdated
|
||
A major challenge is that it seems that xarray cannot easily be forced to | ||
use an alternative representation of dates to the troublesome | ||
`numpy.datetime64` type. The most obvious alternative, `datetime.datetime`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Xarray issue references here would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aware of pydata/xarray#1084. What else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mainly speaking here about my own testing, rather than the xarray issues, but I guess it doesn't hurt to point to issues as well.
design_docs/generalize_calendar.md
Outdated
A major challenge is that it seems that xarray cannot easily be forced to | ||
use an alternative representation of dates to the troublesome | ||
`numpy.datetime64` type. The most obvious alternative, `datetime.datetime`, | ||
seemingly cannot be useddirectly in `xarray` because objects of this type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useddirectly => used directly
The solution proposed herein is to store time as floating-point days since the | ||
reference date 0001-01-01 and to convert dates in this format to | ||
`datetime.datetime` and `MpasRelativeDelta` objects whenever mathematical | ||
manipulation of dates is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is reasonable because the smallest unit of time needing to be accounted for is ms. Note, for clarity indicate that
>>> import sys
>>> sys.float_info.epsilon*24*60*60*1000
1.9184653865522705e-08
>>> sys.float_info.max/365.
4.925186670855659e+305
which indicates we can easily accommodate ms as the finest unit of time as well as long time periods corresponding to paleoclimate timescales. Note, we cannot support ns, but for climate scales we presently have no use case or notion of a use case where we need ns support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, it would be great if you could add this discussion to the design document because it justifies this choice and indicates why it is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that MPAS supports time intervals shorter than a second, but I could be wrong. I don't think we even need millisecond accuracy. But I'll include your argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think sys.float_info.max is useful for this purpose. It would be important to know how large a float needs to be before the steps between consecutive representable values are larger than seconds, and then divide that by 365 to get years. I don't really know how to do that. But it's going to be many, many orders of magnitude larger than anything we would ever want so it's not really worth worrying about.
design_docs/generalize_calendar.md
Outdated
on the appropriate MPAS calendar, either 'gregorain' or 'gregorain_noleap', depending | ||
on the namelist option 'config_calendar_type'. There must be ways of mathematically | ||
manipulating times (e.g. adding/subtracting offsets and figuring out the amount of time | ||
between two dates) and of making plots that are consistent with these calendars. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, if time is just a float of days I think we should not run into too many problems. In the worst case, units from days would have to be converted outside of xarray into s. The only real time we need a true date would be for input or output from the daysSinceReferenceDay
time coordinate, right? Otherwise, all operations would be on actual float days and math should be easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for plots I think this should be relatively straight-forward in terms of just having some CF compliant notion of how to do labeling, e.g., 'days since Reference time of 0001-01-01'. There is probably a way to have xarray inject this title automatically on produced plots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, regarding the first issue, there are actually lots of times we want to work with dates, which is why I've put so much work into functions to convert back and forth. We do math on years and months, for example, and that can't be done with the day formatting. But converting back and forth isn't too bad, just a little clunky to have an extra step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the second issue, there's a function plt.plot_date
that takes exactly the format I'm suggesting here: days since 0001-01-01. It doesn't support other reference dates, so that was one reason I chose that reference.
design_docs/generalize_calendar.md
Outdated
|
||
For all data sets used in the analysis, the 'Time' coordinate must, at a minimum, | ||
support years between 0001 and 9999 (the range of `datetime.datetim`) and preferably | ||
a broader range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implied solution above in the summary section does this cleanly, but perhaps the implied solution is in the wrong place although I'd lean towards having in the implementation section and the summary section because it is such a key idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided the same. I don't think there's a problem with giving away a bit about the intended solution in the summary and the details later. I'd like to keep it where it is (both places).
design_docs/generalize_calendar.md
Outdated
|
||
The conversion operations will be performed with the calendar-aware functions | ||
`netCDF4.date2num` and `netCDF4.num2date` and will support lists/arrays of dates | ||
(for efficiency and simplicity of calling code) in addition to single values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this solution has now changed and is no longer applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is still the solution. These function are just wrapped in the datetimeToDays
and daysToDatetime
functions for simplicity. I will clarify.
design_docs/generalize_calendar.md
Outdated
|
||
|
||
<h1> Design and Implementation </h1> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, I'm not clear exactly where we are in terms of design. Is the following below still applicable and needing review? Perhaps a call would be helpful here- I'm around via phone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I've got an implementation already that needs cleaning up based on various other PRs in progress. The branch exists if you want to take a look but I haven't make a PR because it's not ready for review yet.
@xylar: just so that I understand, is this design document intended for a last PR to better handle the time/calendar? |
@milenaveneziani, no, this is for a PR yet to come. It will build on top of #104, so that's the next priority. In the PR you just did (#102), there was some reorganizing to support a general calendar but nothing yet to address getting rid of I'm doing this in a lot of steps so each step is small enough to make sense on its own and isn't too overwhelming. It's a lot more work this way but should avoid mistakes and make sure everyone is on board. |
@xylar: yes, that's great. Do you think this will be the last PR to fix the problem (not trying to push here, I just want to make sure I understand. and it's also nice to see the finish line, especially for you I am sure :) |
design_docs/generalize_calendar.md
Outdated
</h2> | ||
|
||
For all data sets used in the analysis, the 'Time' coordinate must represent dates | ||
on the appropriate MPAS calendar, either 'gregorain' or 'gregorain_noleap', depending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor typos on 'gregorian'
design_docs/generalize_calendar.md
Outdated
</h2> | ||
|
||
For all data sets used in the analysis, the 'Time' coordinate must, at a minimum, | ||
support years between 0001 and 9999 (the range of `datetime.datetim`) and preferably |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other tiny typo on datetime
@xylar: I don't think I have anything to add to this document. Looks great to me. |
design_docs/generalize_calendar.md
Outdated
``` | ||
**This violates the first requirement in the | ||
[MpasXarray class design doc](https://github.com/xylar/MPAS-Analysis/blob/design_doc_variable_mapping_reorg/design_docs/mpas_xarray_class.md). | ||
I am open to alternative solutions for keeping `mpas_xarray` separate from the rest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, I'm concerned about this issue and I think you will be, too. Please let me know what you think and if you have a clean solution. If a phone call would be best, let me know when you'll be free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what we need. One potential issue- we are at the point where it may no longer make sense to keep mpas_xarray separate from MPAS-Analysis, at least in the short term. Hence, the deviation from the prior design decision is warranted given challenges with xarray and its time coordinate. Hence, I think we need to corporately be ok with loosing independence between mpas_xarray and MPAS-Analysis, at least in the short term.
problem is exacerbated by the fact that there are analysis-specific functions in | ||
`timekeeping`, meaning that this cannot easily be made a submodule of `mpas_xarray` | ||
(nor would this make very much logical sense). Having 2 `timekeeping` modules, one | ||
for `mpas_xarray` and one for MPAS-Analysis, seems unnecessarily confunsing.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, why do we need to have the utilities associated with mpas_xarray
in the timekeeping.utility
file? As long as there are not a large number of these it is ok to include them in mpas_xarray
- this is the simplest solution and resolution to the problem.
The solution of having two timekeeping modules will be confusing long term and I vote against it.
We are at the point, e.g., timekeeping in xarray isn't robust enough to handle MPAS needs, that it may be both impractical and unreasonable to keep MPAS-Analysis and mpas_xarray separate. The whole point of mpas_xarray is to provide an xarray-compatable dataset that can leverages dask for performance. We can't fully do this without handling the troubling time issue, which admittedly is even a fix for the MPAS time datastructures. Hence, we are at the point where keeping mpas_xarray and MPAS-Analysis separate makes less sense and potentially may be confusing in the long term too. This being said, once time is fixed in xarray, we should consider doing a complete refractor (at least in terms of xarray interfacing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for giving this some thought.
why do we need to have the utilities associated with mpas_xarray in the timekeeping.utility file? As long as there are not a large number of these it is ok to include them in mpas_xarray- this is the simplest solution and resolution to the problem.
These utilities are not only needed in mpas_xarray
. They are also used (fairly extensively) in the analysis scripts themselves and also in StreamsFile
. In my view, a developer of MPAS-Analysis
wouldn't find it very intuitive that some timekeeping functions are found in mpas_xarray
while others are in timekeeping
. Yet putting everything in timekeeping
instead into mpas_xarray
would also not be very intuitive -- these utilities clearly have to do with timekeeping but much less clearly to do with xarray
.
I strongly prefer to keep things as they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my previous comments you'll note I'm at the point where I don't think a clean distinction between mpas_xarray and MPAS-Analysis is fully value-added like it used to be in the past. Hence, these types of design decisions are the best we can do and I'm in agreement with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, glad we're on the same page.
This design changes the Time variable in xarray datasets to be the number of days since 0001-01-01.
dc4fbea
to
3793096
Compare
@milenaveneziani, I have squashed the commits and this PR is ready to merge. |
This design doc describes the requirements, design, implementation and testing of an alternative approach for the time dimension of xarray data sets in MPAS-Analysis that can support either MPAS calendar ('gregorian' or 'gregorian_noleap').