-
Notifications
You must be signed in to change notification settings - Fork 52
Replace the Time coordinate in MPAS xarray datasets to days since 0001-01-01 #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The design document for this is #105. |
2bca421
to
a1889c3
Compare
@milenaveneziani and @pwolfram, this is ready for a review (along with the associated design doc, #105). @vonroekel, I'd appreciate a review if you have time/interest. I've marked this as "don't merge" because I'd like to squash commits before this branch gets merged. |
TestingI've run this on my laptop and performed the LANL test cases (including the new test I added in this branch). The climatology results are bit-for-bit the same but the time-series results are not because of changes in the dates and in the details of how these figures are being plotted. I will perform all available Edison tests as well and update this post when I have done so. |
sstClimatologyEndYear = 1900 | ||
# alternative values for present day | ||
#sstClimatologyStartYear = 1990 | ||
#sstClimatologyEndYear = 2011 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar: I think here we need a more general section of this kind:
# first and last year of SST climatology
# choose values that are consistent with the particular simulation time
sstClimatologyStartYear = 1
sstClimatologyEndYear = 10
This is because in most cases, including pre-industrial and present-day simulations, we have simulation years starting from 1, not 1870 or 1990. So we should change this in all config files, and also take this into account in sst_modelvsobs (see below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milenaveneziani, hmm, I could understand this for comparing to another model run but for observations, there are real years over which those observations were taken. I think these numbers should correspond to those real years, not to the years of the model run (which might, indeed, start with 0001).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can check my other comment below, but basically I don't think we should compute different observational climatologies depending on the model years we choose for climos. Those are completely arbitrary for time-slice simulations anyway. I think we just want two (or one, depending on whether the pre-industrial observations are available or not, which in most cases they are not) climatologies for the observations: one for an 1870-1900 period and one for a more present-day period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, let me ask a question to make sure I understand: if ClimatologyStartYear is say 1870, but our simulation years go from say 1 to 250, wouldn't this be a problem in the sense that no model time will be larger than 1870?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milenaveneziani, no. The model time is not used here at all. Instead, years 1870 to 1900 are loaded from the observational data and then averaged together to make a climatology (without ever knowing anything at all about the simulation time). Then, that climatology is compared with one made from the simulation.
Obviously, it would seem strangely redundant if you did have a simulation that covered, say, 1850 to 2000 and you said you wanted to make a climatology from 1870 to 1900 and you also had to say you wanted to compare with observation from 1870 to 1900, but that is what you would have to do with my suggested changes.
I agree that it would be much simpler to just have a single option instead -- period = 'pre-industrial'
or period = 'present-day'
and each observational climatology would be selected accordingly. But, again, I think that's outside the scope of this PR.
What can I do to address these concerns within the scope of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milenaveneziani it looks like there are now two start and end dates in the climatology section, one being sstClimatologyStartYear, which only applies to the observations and the other being startDate which is for the model output. From my quick pass through the code, if startDate and sstClimatologyStartYear are appropriately selected, this should work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanroekel, yes, that's exactly the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, @vanroekel: I am sorry, I misunderstood the two config variables completely (I thought they were the general startYear and endYear for computing climatologies..).
So, I would say let's forget about this completely for this PR (and I agree that it is not a priority at the moment, since this only involves the observational climatologies, not the model climos as well), and please only change the text to:
# first and last year of SST observational climatology (could simply use examples given below)
# values for preindustrial period
sstClimatologyStartYear = 1870
sstClimatologyEndYear = 1900
# alternative values for present day period
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! Done!
timeStart = datetime.datetime(year=climStartYear, month=1, day=1) | ||
timeEnd = datetime.datetime(year=climEndYear, month=12, day=31) | ||
|
||
if climStartYear < 1925: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work if our simulation years start from 1 (as in most cases). So, we probably need to add a config variable here for this particular diagnostics (SST climo): something like period = preindustrial
or period = presentday
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think these need to be real years corresponding to observations. So I don't think there will be a problem with using a real year as the cutoff. But I agree that we could have a config option to determine this name.
"Observations (Hadley/OI, {})".format(preindustrialText) | ||
"Observations (Hadley/OI, {} {:04d}-{:04d})".format(period, | ||
climStartYear, | ||
climEndYear) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @vanroekel should weigh in on this too, since this was originally his code. My take is that we should have two preprocessed climatologies for the SST: one for pre-industrial (years 1870-1900) and one for present-day (years 1900-present). I believe I have already calculated such averages for the annual mean, but we can do this easily for the seasonal averages as well. And then we just point to the right data set depending on the variable period
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, well maybe cleaning up this script further needs to be a separate PR after this one. I think this is too much to expect in this particular PR. All I was hoping to do was to make the current script work correctly without yearOffset
. I am happy to add one new config option for the period
of the observations to be used (pre-industrial
or present-day
) but I don't want to get into the whole can of worms of how the data gets preprocessed here.
Currently, the two time periods, pre-industrial and present-day are computed every time the script is run, so I agree that it would be a time savings (though perhaps a small one) to precompute them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Let's sort this out first and then we can do a separate PR for the preprocessing, if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ambivalent as to how the observations are preprocessed, but agree with @xylar that this is outside the scope of this PR. Certainly not doing the time averaging every time is more efficient, but I'm not convinced it will be a large cost savings. Note the SSS observations also require time averaging. If the desire of everyone is to preprocess, I'm happy to take this on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, @xylar, I don't it is necessary to add a period
config variable if the obs climatologies are computed on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, we can reconsider the period
option if we decide to preprocess the climatologies in the future.
calendar=calendar, | ||
simulationStartTime=simulationStartTime, | ||
timeVariableName='xtime') | ||
yearEndPreprocessed = days_to_datetime(dsPreprocessed.Time.min()).year |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be dsPreprocessed.Time.max()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
@xylar: I tested this on edison and 1) so nice to see actual model years now in the timeseries!; 2) I noticed a couple of small issues which I think will be easy to fix:
Here are the sea-ice timeseries plots with this newer branch and from the old code: |
@xylar: here is what I found out. I printed the following out of the sea-ice timeseries script:
So, while the latter contains times that are closer to mid-month for each month, the former contains times that fall more on the 20-23th day of the month, for some reason. Also, there are only 119 times instead of 120 (I chose yearStart=41 and yearEnd=50), which is also a little strange. |
@milenaveneziani, I've looked into the problem you report above carefully. I think there is little or nothing I can do in this PR, because the
This date is equal to |
@milenaveneziani, I fixed the issue with the last missing cycle in With regard to the differences between the two plots you posted, here are my thoughts:
At this point, I believe the results in this PR to be more correct than those in |
OK, so it is daysSinceStart: I suspected that might have the problem, but couldn't point exactly where in the code that was happening. I agree with all your points above, so from my point of view my tests have passed. Is there more testing that you think I should do, @xylar? @pwolfram: do you want to give this and #105 another look before @xylar prepares this for merging? |
@@ -8,3 +8,6 @@ dependencies: | |||
- xarray | |||
- matplotlib | |||
- dask | |||
- netcdf4 | |||
- hdf5 | |||
- hdf4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, can you please clarify why both hdf4
and hdf5
are needed? If I understand correctly, only one of these libraries will be used in netcdf4
and xarray
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, thanks for taking the time to do a review.
My understanding is that hdf5
should be needed for netcdf4. However, it was complaining that hdf4
was missing even when hdf5
was present. So I added hdf4
as well. If you can figure out how to get it working without hdf4
, I'm happy to remove it.
To be honest, I don't know why the required libraries (hdf4
, hdf5
, or both) weren't installed automatically. I'm just doing what it takes to fix the CI...
timeStart = clampToNumpyDatetime64(stringToDatetime(startDate), yearOffset) | ||
timeEnd = clampToNumpyDatetime64(stringToDatetime(endDate), yearOffset) | ||
timeStart = string_to_datetime(startDate) | ||
timeEnd = string_to_datetime(endDate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much nicer now!
|
||
ds.coords['month'] = ('Time', months) | ||
monthlyClimatology = ds.groupby('month').mean('Time') | ||
|
||
daysInMonth = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31] | ||
monthLetters = ['J', 'F', 'M', 'A', 'M', 'J', 'J', 'A', 'S', 'O', 'N', 'D'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should have / use these variables from the constants file location: https://github.com/MPAS-Dev/MPAS-Analysis/blob/develop/mpas_analysis/shared/constants/constants.py. I don't think we should just hard-code this type of information in these files, but this is out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to make an issue about that, go for it. Not related to this PR, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, in #113
totalDays = days | ||
for month in months[1:]: | ||
days = daysInMonth[month-1] | ||
climatology += days*monthlyClimotology.sel(month=month) | ||
climatology += days*monthlyClimatology.sel(month=month) | ||
totalDays += days | ||
climatology /= totalDays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be converted to a function-- again, out of scope for this PR but worth noting for future refractors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's very helpful to note these things here because they'll be forgotten as soon as this PR gets merged. You'd need to create issues for them instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, see #114.
@@ -357,19 +349,40 @@ def seaice_timeseries(config, streamMap=None, variableMap=None): | |||
variableName)) | |||
|
|||
|
|||
def replicate_cycle(ds, dsToReplicate): | |||
def replicate_cycle(ds, dsToReplicate, calendar): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a doc string for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that's a good idea. I'll add one.
if (title != None): | ||
'color': config.get('plot', 'titleFontColor'), | ||
'weight': config.get('plot', 'titleFontWeight')} | ||
if title is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, why the switch from !=
to is not? We have had this discussion in the past and the previous decision was to use
!=over
is notfor this type of question. This also begs the question of why isn't
title=None` in the function argument? Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have read in various stacktrace posts that blah is not None
is far preferred to blah != None
. Here's an explanation of why. http://jaredgrubb.blogspot.de/2009/04/python-is-none-vs-none.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is really not the point of this PR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just had this discussion before to the other outcome, that is why I asked. I agree that the is not None
approach is better. What we may want to add to the function signature is title=None
as a default, but that is up to you. It is the point of this PR because you are changing code, hence why I asked: !=
changed to is not
.
This brings up a good point-- in the future we probably want to keep style / formatting changes to their own PRs for clarity. We have been letting this slide for convenience but it is not best practices. In other words, each PR should be more narrowly focused and only change the lines relevant to its scope. This will also help greatly in the review process at the small cost of having more PRs to review, although syntax and formating-only PRs should be very easy to review, especially with the develop and master branch safety net.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, I think it's nearly impossible to keep formatting changes to their own PRs. I think that's ridiculously restrictive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'm at least willing to give it a try. That would mean we need a much faster turnaround on such PRs than on these other, more substantive ones.
Part of what makes that impractical is that you have to constantly rebase onto those PRs as you're trying to develop something more practical. That becomes impossibly cumbersome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding "is not None" vs "!= None", I'm quite certain we never reached the opposite conclusion because this is a standard I have known about for quite awhile. If you can point me to the discussion you're referring to, that would be helpful.
|
||
def _round_datetime(date): | ||
"""Round a datetime object to nearest second | ||
date : datetime.datetime or similar objet object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want / need full doc strings here? I'm ok with the simple ones presented but thought I'd ask, more so for my own information in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary for non-public functions.
calendar=calendar, | ||
simulationStartTime=simulationStartTime, | ||
timeVariableName=timestr, | ||
variableList=variableList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be indented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is a correct PEP8 indentation convention, made necessary because simulationStartTime=simulationStartTime
is too long with the previous indentation convention. There is not a good way of indenting to match with open parenthesis without having the line be too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. I would tend to drop PEP8 convention over readability for these types of cases, but that is obviously a non-essential style issue. This is one of the drawbacks with pure adherence to PEP8 in my mind...
simulationStartTime=simulationStartTime, | ||
timeVariableName=timestr, | ||
variableList=variableList, | ||
selValues=selvals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is correct as it is.
simulationStartTime=simulationStartTime, | ||
timeVariableName=timestr, | ||
variableList=variableList, | ||
selValues=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation missing, last time commenting on this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think it's fine.
@pwolfram, I've either addressed or responded to your comments. Please let us know if there are remaining issues. |
I've approved the PR @xylar. I can test and/or merge this PR this afternoon @milenaveneziani if that is helpful. |
Utility functions for converting to/from days since 0001-01-01 are added to timekeeping/utility.py and unit testing of these new functions has been added. The time coordinate in mpas_xarray is changed to days since ref date, and the unit testing has been updated accordingly. The generalized_reader has also been updated to interface correctly with mpas_xarray, and its unit tests have also been updated. The timekeeping utility funcitons have their case/formatting changed to better follow PEP8 prescription. Other shared functions and unit tests are updated to handle the new function and argument names. Add config for a test sea ice run Add config options for SST obs. climatology Remove [time] section (no longer needed) Update timeseries plotting to handle new dates Ticks are now only years, with minor ticks for months. Update variable case for timekeeping Switch from underscores to CamelCase Add netcdf4 to CI In mpas_xarray._parse_date_string, simulationStartTime and referenceDate were being compared as strings, but are now compared as datetime objects so that equivalent dates are considered equal
836d93d
to
4f7bbe1
Compare
Okay, I squashed commits. I would suggest that @milenaveneziani do one last check on either Edison or Wolf to make sure things are fine and then merge. I don't anticipate any problems with the squash but best to be sure... |
tests passed again. all good. |
This merge modifies mpas_xarray and generalized_reader to produce a 'Time' coordinate that is the floating-point number of days since 0001-01-01.
This merge also updates some of the case/underscore conventions in the timekeeping module to use mixed case (CamelCase) for variable and function argument names.
The 5 analysis scripts have been updated to perform date computations by converting 'Time' coordinates to datetime objects (using
days_to_datetime
) and using the calendar-awareMpasRelativeDelta
class for computations with offsets in time.The
yearOffset
config option has been eliminated. New confg options have been added for selecting a range of dates to be used to create an SST climatology.A config file for an MPAS-SeaIce test case has been added, showing that a run with start date of 1958-01-01 can be accommodated.
Add NetCDF4 to dependencies of CI.