-
Notifications
You must be signed in to change notification settings - Fork 53
Fix moving average of preprocessed OHC data #325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is intended to address #324. I think the issue at the heart of this is something to do with combining dask and the |
@milenaveneziani, could you check if this works for you in whatever environment(s) you have handy? |
# (without dask) | ||
dsPreprocessed = dsPreprocessed.drop('xtime') | ||
write_netcdf(dsPreprocessed, self.preprocessedFileName) | ||
dsPreprocessed = xarray.open_dataset(self.preprocessedFileName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, if you have time to take a look at this, it's mainly a question of seeing if you're good with this solution for converting a multi-file data set to a single file data set or if you'd suggest some other way of handling the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding from S Hoyer is that this is the preferred way to handle these types of problems. The only issue I foresee with this is that as data becomes larger, this doesn't scale too well. However, recent work has been to develop parallel writing functionality into xarray so I wouldn't worry about this in the short term.
I think this is a reasonable way to convert a multi-file dataset into a single file data set 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
'not be plotted.') | ||
preprocessedReferenceRunName = 'None' | ||
|
||
# rolling mean seems to have trouble with dask data sets to we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor typo: 'to' should be 'so'
# (without dask) | ||
dsPreprocessed = dsPreprocessed.drop('xtime') | ||
write_netcdf(dsPreprocessed, self.preprocessedFileName) | ||
dsPreprocessed = xarray.open_dataset(self.preprocessedFileName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding from S Hoyer is that this is the preferred way to handle these types of problems. The only issue I foresee with this is that as data becomes larger, this doesn't scale too well. However, recent work has been to develop parallel writing functionality into xarray so I wouldn't worry about this in the short term.
I think this is a reasonable way to convert a multi-file dataset into a single file data set 👍
This is accomplished by writing out the multi-file data set and reading it in again as as single-file data set.
e188671
to
668f697
Compare
@milenaveneziani, is this something you might be able to test sometime soon? |
sorry for the delay @xylar: I'll be testing this today. |
@xylar: if I test this on edison, will I be using the xarray/dask version that was causing the problem? |
@milenaveneziani, if you use e3sm-unified/1.1.3, I think that is the right version. Even if not, the important thing is that things work with that particular version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on edison with e3sm_unified_1.1.3 and all worked fine.
Thanks, @milenaveneziani! |
This is accomplished by writing out the multi-file data set and reading it in again as as single-file data set.