-
Notifications
You must be signed in to change notification settings - Fork 52
Design Document: Moving variable mapping out of mpas_xarray #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design Document: Moving variable mapping out of mpas_xarray #108
Conversation
@pwolfram, please let me know if this design document meets our expectations. And, of course, I'd like to make sure you're okay with its contents. I also did a considerable amount of work this evening to rename function and variable names to better conform to PEP8 conventions. Finally, I decided to go with a module called |
9d78d61
to
e6ce8d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good but just a little more clarity / justification will help as noted in this review.
design_docs/mpas_xarray_class.md
Outdated
MpasXarray class is created with two methods, `open_multifile_dataset` and | ||
`_preprocess` that can be overloaded by subclasses. A subclass, | ||
MpasAnalysisXarray, is also added in which these methods are overloaded to | ||
add support for (among other things) mapping variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, this seems like it may be an ok way to go but we need to address the question of whether subclassing or class / functional composition is better. There are pros / cons to both but at this point subclassing seems fairly obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, my suggested implementation is definitely subclassing, as that's what I've already implemented. I'm not sure what you mean by class/functional composition. If you'd like to clarify the pros of that approach, I'm definitely willing to consider an alternative. I guess the pros would have to be significant for me to want to redo what I've done.
design_docs/mpas_xarray_class.md
Outdated
|
||
MPAS-Analysis specific functionality such as variable mapping should be | ||
removed from mpas_xarray so it can remain an independent module, requiring | ||
minimal modification to accommodate MPAS-Analysis' needs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
design_docs/mpas_xarray_class.md
Outdated
|
||
There should be a way to perform MPAS-Analysis specific functionality such as | ||
mapping variables during preprocessing. This functionality should be | ||
relatively easy to add to as new preprocessing needs arise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
design_docs/mpas_xarray_class.md
Outdated
developing mpas_xarray and its analysis-specific derived reader. | ||
It also risks confusing a new developer by having seemingly redundant | ||
functionality in two places. However, it seems to me to be the | ||
cleanest way to satisfy both requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going back to the idea of class and/or functional composition, what is the benefit of subclassing in your mind over these alternative approaches, which may be simpler for new programmers to understand? To be fair, I'm ok with subclassing but just want to make sure we have considered the alternative approach via just functional or class-composition approaches. The reason I bring this up is because I don't think we will need to subclass the xarray interace multiple times and often subclassing makes code harder, not easier, to understand unless one is using a model really suited to subclassing, e.g., numerical method that could take quad, hex, or triangular grids, for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm learning your terminology as I go. Now that I understand what functional and class-composition approaches are, I think a functional approach could be a clear alternative. A class-composition approach might not work without effectively combining it with a functional approach because MpasXarray needs to know what preprocess function to call.
Over all, I'm in favor of considering a purely functional approach. I will see how far I can get with that modification. If it works, it will probably make things simpler. Effectively, we would be returning to mpas_xarray as it was with some clean up and an additional open_multifile_dataset
function.
design_docs/mpas_xarray_class.md
Outdated
- `_assert_valid_selections` | ||
- `_ensure_list` | ||
- `_get_datetimes` | ||
- `_remove_repeated_time_index` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may actually want this public, e.g., if we load and join two separate parts of a dataset at the user-level.
design_docs/mpas_xarray_class.md
Outdated
- I have removed the ability to run `mpas_xarray.py` as a script and the associated | ||
tests. This is on the premise that 1) the test were outdated and would have | ||
needed to be updated to work with the current code and 2) unit testing in | ||
`test/test_mpas_xarray.py` takes care of this capability in a better way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
design_docs/mpas_xarray_class.md
Outdated
needed to be updated to work with the current code and 2) unit testing in | ||
`test/test_mpas_xarray.py` takes care of this capability in a better way. | ||
- I have tried to make variable names a bit more verbose in various places | ||
(e.g. dataset in place of ds). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep ds
, this is an xarray-standard and removes versus adds clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree but I will switch back to ds
.
design_docs/mpas_xarray_class.md
Outdated
- I have tried to make variable names a bit more verbose in various places | ||
(e.g. dataset in place of ds). | ||
- I have tried to improve the docstrings using a syntax that should be useful | ||
for generating documentation later on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
design_docs/mpas_xarray_class.md
Outdated
(e.g. dataset in place of ds). | ||
- I have tried to improve the docstrings using a syntax that should be useful | ||
for generating documentation later on. | ||
- I have update unit testing to work with the new inerface, notably the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'll try to complete this thought...
design_docs/mpas_xarray_class.md
Outdated
utility.py | ||
- mpas_xarray/ | ||
__init__.py | ||
mpas_xarray.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the dataset
name for the folder is most intuitive, perhaps there is something more specific to the task of having variables changing from model versions, e.g., generalized_reader
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still run the risk of the question being asked why the separate approaches. Obviously mpas_xarray is simpler. Perhaps we want to do a rename: mpas_xarray_interface.py
or some such to indicate it is the simplest reader available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, while I'm willing to go along with it, I have the same questions about why there's a separate approach. I don't see using mpas_xarray
outside of MPAS-Analysis
very often if at all myself. But I'm willing to respect your wishes to keep it autonomous.
But I'm not sure the rename to mpas_xarray_interface.py
does much to clarify that this is the simplest reader available. I think the directory structure alone isn't going to do that unless we renamed mpas_xarray
to something like mpas_simple_reader
and dataset
to generalized_analysis_reader
or similar. This, of course, messes up your mpas_xarray
branding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that dataset
should be named something more descriptive. I'm good with generalized_reader
. I suppose the file within generalized_reader
would also be generalized_reader.py
?
@pwolfram, thanks for the review. I will try a functional approach instead, which will mean we return to something much more similar to |
@pwolfram, thank you again for your review. Upon considering your thoughts, I realized this could, indeed, all be accomplished with functions instead of classes. The only thing that is minorly troubling is a bit of redundancy in the I have made extensive edits to the design document based on the changes I've made. I've also updated the associated test branch and can confirm that everything seems to be working fine with the changes. I would welcome additional comments or a look over the design document to make sure it now meets with your approval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, here are a few minor comments but I think the design and approach here is clearer than previously noted. There are just a few comments, e.g., add some pseudo-code, but I would say that we can tentatively move on to the code review for now and only revisit this design if we encounter real problems.
that does address the requirement.) | ||
|
||
A new module, `generalized_reader` will also be added with its own | ||
`open_multifile_dataset` function. This version takes additional arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would use this as
from generalized_reader import open_multifile_dataset
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice because it is clear that there is specific functionality being used for the opening operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the way it's being used.
these examples should make it clear which version of | ||
`open_multifile_dataset` is most appropriate. Nevertheless, clear | ||
documentation of `generalized_reader` and `mpas_xarray`, and their | ||
differences are needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please put some API code in the design document here to clarify usage, e.g., so a user doesn't have to dig through the existing code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that makes sense.
Commits have been squashed. @pwolfram, go ahead and merge this when you're ready. |
01238e6
to
c67f71e
Compare
I have just merged this as well, but for some reason it doesn't show here.. |
This design document describes a reorganization of the code for loading xarray datasets to pull out MPAS-Analysis specific functionality out of the mpas_xarray module.