Skip to content

Design Document: Moving variable mapping out of mpas_xarray #108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Feb 10, 2017

This design document describes a reorganization of the code for loading xarray datasets to pull out MPAS-Analysis specific functionality out of the mpas_xarray module.

@xylar
Copy link
Collaborator Author

xylar commented Feb 10, 2017

@pwolfram, please let me know if this design document meets our expectations. And, of course, I'd like to make sure you're okay with its contents.

I also did a considerable amount of work this evening to rename function and variable names to better conform to PEP8 conventions.

Finally, I decided to go with a module called dataset within shared and put MpasAnalysisXarray in there. There's also a utility.py module that contains just open_multifile_dataset, which is the functionI expect most users and analysis scripts will use almost exclusively, rather than invoking MpasAnalysisXarray or mpas_xarray directly.

Copy link
Contributor

@pwolfram pwolfram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good but just a little more clarity / justification will help as noted in this review.

MpasXarray class is created with two methods, `open_multifile_dataset` and
`_preprocess` that can be overloaded by subclasses. A subclass,
MpasAnalysisXarray, is also added in which these methods are overloaded to
add support for (among other things) mapping variables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, this seems like it may be an ok way to go but we need to address the question of whether subclassing or class / functional composition is better. There are pros / cons to both but at this point subclassing seems fairly obvious.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram, my suggested implementation is definitely subclassing, as that's what I've already implemented. I'm not sure what you mean by class/functional composition. If you'd like to clarify the pros of that approach, I'm definitely willing to consider an alternative. I guess the pros would have to be significant for me to want to redo what I've done.


MPAS-Analysis specific functionality such as variable mapping should be
removed from mpas_xarray so it can remain an independent module, requiring
minimal modification to accommodate MPAS-Analysis' needs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


There should be a way to perform MPAS-Analysis specific functionality such as
mapping variables during preprocessing. This functionality should be
relatively easy to add to as new preprocessing needs arise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

developing mpas_xarray and its analysis-specific derived reader.
It also risks confusing a new developer by having seemingly redundant
functionality in two places. However, it seems to me to be the
cleanest way to satisfy both requirements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going back to the idea of class and/or functional composition, what is the benefit of subclassing in your mind over these alternative approaches, which may be simpler for new programmers to understand? To be fair, I'm ok with subclassing but just want to make sure we have considered the alternative approach via just functional or class-composition approaches. The reason I bring this up is because I don't think we will need to subclass the xarray interace multiple times and often subclassing makes code harder, not easier, to understand unless one is using a model really suited to subclassing, e.g., numerical method that could take quad, hex, or triangular grids, for instance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm learning your terminology as I go. Now that I understand what functional and class-composition approaches are, I think a functional approach could be a clear alternative. A class-composition approach might not work without effectively combining it with a functional approach because MpasXarray needs to know what preprocess function to call.

Over all, I'm in favor of considering a purely functional approach. I will see how far I can get with that modification. If it works, it will probably make things simpler. Effectively, we would be returning to mpas_xarray as it was with some clean up and an additional open_multifile_dataset function.

- `_assert_valid_selections`
- `_ensure_list`
- `_get_datetimes`
- `_remove_repeated_time_index`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may actually want this public, e.g., if we load and join two separate parts of a dataset at the user-level.

- I have removed the ability to run `mpas_xarray.py` as a script and the associated
tests. This is on the premise that 1) the test were outdated and would have
needed to be updated to work with the current code and 2) unit testing in
`test/test_mpas_xarray.py` takes care of this capability in a better way.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

needed to be updated to work with the current code and 2) unit testing in
`test/test_mpas_xarray.py` takes care of this capability in a better way.
- I have tried to make variable names a bit more verbose in various places
(e.g. dataset in place of ds).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep ds, this is an xarray-standard and removes versus adds clarity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree but I will switch back to ds.

- I have tried to make variable names a bit more verbose in various places
(e.g. dataset in place of ds).
- I have tried to improve the docstrings using a syntax that should be useful
for generating documentation later on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

(e.g. dataset in place of ds).
- I have tried to improve the docstrings using a syntax that should be useful
for generating documentation later on.
- I have update unit testing to work with the new inerface, notably the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'll try to complete this thought...

utility.py
- mpas_xarray/
__init__.py
mpas_xarray.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the dataset name for the folder is most intuitive, perhaps there is something more specific to the task of having variables changing from model versions, e.g., generalized_reader or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still run the risk of the question being asked why the separate approaches. Obviously mpas_xarray is simpler. Perhaps we want to do a rename: mpas_xarray_interface.py or some such to indicate it is the simplest reader available?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, while I'm willing to go along with it, I have the same questions about why there's a separate approach. I don't see using mpas_xarray outside of MPAS-Analysis very often if at all myself. But I'm willing to respect your wishes to keep it autonomous.

But I'm not sure the rename to mpas_xarray_interface.py does much to clarify that this is the simplest reader available. I think the directory structure alone isn't going to do that unless we renamed mpas_xarray to something like mpas_simple_reader and dataset to generalized_analysis_reader or similar. This, of course, messes up your mpas_xarray branding.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that dataset should be named something more descriptive. I'm good with generalized_reader. I suppose the file within generalized_reader would also be generalized_reader.py?

@xylar
Copy link
Collaborator Author

xylar commented Feb 15, 2017

@pwolfram, thanks for the review. I will try a functional approach instead, which will mean we return to something much more similar to mpas_xarray as it currently exists. I will let you know when I have a revised design doc and associated implementation branch for you to look at.

@xylar
Copy link
Collaborator Author

xylar commented Feb 15, 2017

@pwolfram, thank you again for your review. Upon considering your thoughts, I realized this could, indeed, all be accomplished with functions instead of classes. The only thing that is minorly troubling is a bit of redundancy in the open_multifile_dataset functions in both mpas_xarray and the new generalized_reader that I couldn't figure out a clean way to avoid.

I have made extensive edits to the design document based on the changes I've made. I've also updated the associated test branch and can confirm that everything seems to be working fine with the changes.

I would welcome additional comments or a look over the design document to make sure it now meets with your approval.

Copy link
Contributor

@pwolfram pwolfram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, here are a few minor comments but I think the design and approach here is clearer than previously noted. There are just a few comments, e.g., add some pseudo-code, but I would say that we can tentatively move on to the code review for now and only revisit this design if we encounter real problems.

that does address the requirement.)

A new module, `generalized_reader` will also be added with its own
`open_multifile_dataset` function. This version takes additional arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would use this as

from generalized_reader import open_multifile_dataset

right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice because it is clear that there is specific functionality being used for the opening operation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the way it's being used.

these examples should make it clear which version of
`open_multifile_dataset` is most appropriate. Nevertheless, clear
documentation of `generalized_reader` and `mpas_xarray`, and their
differences are needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please put some API code in the design document here to clarify usage, e.g., so a user doesn't have to dig through the existing code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that makes sense.

@xylar
Copy link
Collaborator Author

xylar commented Feb 16, 2017

I addressed @pwolfram's latest comment. I'm going to go ahead and squash the commits. I will leave the design doc open until #104 is merged but I hope it's done.

@xylar
Copy link
Collaborator Author

xylar commented Feb 16, 2017

Commits have been squashed. @pwolfram, go ahead and merge this when you're ready.

@xylar xylar force-pushed the design_doc_variable_mapping_reorg branch from 01238e6 to c67f71e Compare February 16, 2017 22:32
@milenaveneziani
Copy link
Collaborator

milenaveneziani commented Feb 16, 2017

I have just merged this as well, but for some reason it doesn't show here..
Update: never mind, I had forgotten to fetch the latest rebase.

@milenaveneziani milenaveneziani merged commit c67f71e into MPAS-Dev:develop Feb 16, 2017
@xylar xylar deleted the design_doc_variable_mapping_reorg branch February 23, 2017 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants