Skip to content

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Apr 5, 2018

This is accomplished by writing out the multi-file data set and reading it in again as as single-file data set.

@xylar xylar added the bug label Apr 5, 2018
@xylar xylar self-assigned this Apr 5, 2018
@xylar
Copy link
Collaborator Author

xylar commented Apr 5, 2018

This is intended to address #324. I think the issue at the heart of this is something to do with combining dask and the rolling operator in xarray but I don't have enough time or interest to make a proper issue on the xarray forum right now. This solution seems simple enough and the file generated is tiny.

@xylar
Copy link
Collaborator Author

xylar commented Apr 5, 2018

@milenaveneziani, could you check if this works for you in whatever environment(s) you have handy?

# (without dask)
dsPreprocessed = dsPreprocessed.drop('xtime')
write_netcdf(dsPreprocessed, self.preprocessedFileName)
dsPreprocessed = xarray.open_dataset(self.preprocessedFileName)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram, if you have time to take a look at this, it's mainly a question of seeing if you're good with this solution for converting a multi-file data set to a single file data set or if you'd suggest some other way of handling the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from S Hoyer is that this is the preferred way to handle these types of problems. The only issue I foresee with this is that as data becomes larger, this doesn't scale too well. However, recent work has been to develop parallel writing functionality into xarray so I wouldn't worry about this in the short term.

I think this is a reasonable way to convert a multi-file dataset into a single file data set 👍

Copy link
Contributor

@pwolfram pwolfram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

'not be plotted.')
preprocessedReferenceRunName = 'None'

# rolling mean seems to have trouble with dask data sets to we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo: 'to' should be 'so'

# (without dask)
dsPreprocessed = dsPreprocessed.drop('xtime')
write_netcdf(dsPreprocessed, self.preprocessedFileName)
dsPreprocessed = xarray.open_dataset(self.preprocessedFileName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from S Hoyer is that this is the preferred way to handle these types of problems. The only issue I foresee with this is that as data becomes larger, this doesn't scale too well. However, recent work has been to develop parallel writing functionality into xarray so I wouldn't worry about this in the short term.

I think this is a reasonable way to convert a multi-file dataset into a single file data set 👍

This is accomplished by writing out the multi-file data set and
reading it in again as as single-file data set.
@xylar xylar force-pushed the fix_moving_average branch from e188671 to 668f697 Compare April 12, 2018 14:26
@xylar
Copy link
Collaborator Author

xylar commented Apr 12, 2018

@milenaveneziani, is this something you might be able to test sometime soon?

@milenaveneziani
Copy link
Collaborator

sorry for the delay @xylar: I'll be testing this today.

@milenaveneziani
Copy link
Collaborator

@xylar: if I test this on edison, will I be using the xarray/dask version that was causing the problem?

@xylar
Copy link
Collaborator Author

xylar commented Apr 12, 2018

@milenaveneziani, if you use e3sm-unified/1.1.3, I think that is the right version. Even if not, the important thing is that things work with that particular version.

Copy link
Collaborator

@milenaveneziani milenaveneziani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on edison with e3sm_unified_1.1.3 and all worked fine.

@xylar
Copy link
Collaborator Author

xylar commented Apr 12, 2018

Thanks, @milenaveneziani!

@xylar xylar merged commit dae9187 into MPAS-Dev:develop Apr 12, 2018
@xylar xylar deleted the fix_moving_average branch April 12, 2018 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants