|
| 1 | +<h1> Title: Moving variable mapping outside of mpas_xarray <br> |
| 2 | +Xylar Asay-Davis <br> |
| 3 | +date: 2017/02/10 <br> |
| 4 | +</h1> |
| 5 | +<h2> Summary </h2> |
| 6 | +In discussions with @pwolfram, it became clear that we would like to keep |
| 7 | +mpas_xarray as general as possible, rather than adding code specific to |
| 8 | +MPAS-Analysis. In particular, the capability for mapping variable names |
| 9 | +that is currently part of mpas_xarray is likely a capability that only |
| 10 | +MPAS-Analysis will need when opening xarray data sets. Likewise, there is |
| 11 | +a desire for mpax_xarray not to use any of the functionality outside of its |
| 12 | +own module so that it remains autonomous from MPAS-Analysis. |
| 13 | + |
| 14 | +At the same time, it is desirable for efficiency and parallelism to perform |
| 15 | +certain operations during the preprocessing step within xarray, rather than |
| 16 | +constructing a data set first and then (in serial) performing manipulations |
| 17 | +(e.g. creating a time coordinate and slicing variables). |
| 18 | + |
| 19 | +To solve these problems, this document proposes a solution in which a new |
| 20 | +MpasXarray class is created with two methods, `open_multifile_dataset` and |
| 21 | +`_preprocess` that can be overloaded by subclasses. A subclass, |
| 22 | +MpasAnalysisXarray, is also added in which these methods are overloaded to |
| 23 | +add support for (among other things) mapping variables. |
| 24 | + |
| 25 | +The solution will be tested by making sure it produces bit-for-bit identical |
| 26 | +results to those from the develop branch for typical test cases on LANL IC |
| 27 | +and Edison. |
| 28 | + |
| 29 | +<h1> Requirements </h1> |
| 30 | + |
| 31 | +<h2> Requirement: mpas_xarray does not include MPAS-Analysis specific |
| 32 | +functionality <br> |
| 33 | +Date last modified: 2017/02/10 <br> |
| 34 | +Contributors: Xylar Asay-Davis |
| 35 | +</h2> |
| 36 | + |
| 37 | +MPAS-Analysis specific functionality such as variable mapping should be |
| 38 | +removed from mpas_xarray so it can remain an independent module, requiring |
| 39 | +minimal modification to accommodate MPAS-Analysis' needs. |
| 40 | + |
| 41 | +<h2> Requirement: MPAS-Analysis specific functionality should be supported in |
| 42 | +xarray preprossing <br> |
| 43 | +Date last modified: 2017/02/10 <br> |
| 44 | +Contributors: Xylar Asay-Davis |
| 45 | +</h2> |
| 46 | + |
| 47 | +There should be a way to perform MPAS-Analysis specific functionality such as |
| 48 | +mapping variables during preprocessing. This functionality should be |
| 49 | +relatively easy to add to as new preprocessing needs arise. |
| 50 | + |
| 51 | + |
| 52 | +<h1> Algorithmic Formulations (optional) </h1> |
| 53 | + |
| 54 | +<h2> Algorithm: mpas_xarray does not include MPAS-Analysis specific |
| 55 | +functionality <br> |
| 56 | +Date last modified: 2017/02/10 <br> |
| 57 | +Contributors: Xylar Asay-Davis |
| 58 | +</h2> |
| 59 | + |
| 60 | +All functions and function arguments related to variable mapping will |
| 61 | +be removed from mpas_xarray and moved elsewhere. |
| 62 | + |
| 63 | +<h2> Algorithm: MPAS-Analysis specific functionality should be supported in |
| 64 | +xarray preprossing <br> |
| 65 | +Date last modified: 2017/02/10 <br> |
| 66 | +Contributors: Xylar Asay-Davis |
| 67 | +</h2> |
| 68 | + |
| 69 | +While this requirement seems almost contradictory to the previous |
| 70 | +requirement, both can be accommodated by using classes and inheritance. |
| 71 | +An MpasXarray class will be added to mpas_xarray, including a method |
| 72 | +(`_preprocess()`) for performing preprocessing. Then, a subclass will |
| 73 | +be created (MpasAnalysisXarray) that includes the functionality that |
| 74 | +is specific to MPAS-Analysis by overloading the preprocessing method |
| 75 | +from MpasXarray. |
| 76 | + |
| 77 | +This solution may add a level of complexity, particularly for |
| 78 | +developing mpas_xarray and its analysis-specific derived reader. |
| 79 | +It also risks confusing a new developer by having seemingly redundant |
| 80 | +functionality in two places. However, it seems to me to be the |
| 81 | +cleanest way to satisfy both requirements. |
| 82 | + |
| 83 | + |
| 84 | +<h1> Design and Implementation </h1> |
| 85 | + |
| 86 | +<h2> Implementation: mpas_xarray does not include MPAS-Analysis specific |
| 87 | +functionality <br> |
| 88 | +Date last modified: 2017/02/10 <br> |
| 89 | +Contributors: Xylar Asay-Davis |
| 90 | +</h2> |
| 91 | + |
| 92 | +A test branch can be found here |
| 93 | +[xylar/MPAS-Analysis/variable_mapping_reorg](https://github.com/xylar/MPAS-Analysis/tree/variable_mapping_reorg) |
| 94 | + |
| 95 | +I have removed `map_variable` and `rename_variables` from `mpas_xarray`. |
| 96 | +I also removed any mention of the variable map from the rest of `mpas_xarray`. |
| 97 | + |
| 98 | +This branch also includes several other cleanup operations that are not |
| 99 | +addressing any requirements. These include: |
| 100 | + - I added a new helper function, `open_multifile_dataset`, for opening an |
| 101 | + xarray data set in a single, simple command without reference to the |
| 102 | + preprocessor. This function should make opening new data sets more |
| 103 | + intuitive for mpas_xarray users. Calling the |
| 104 | + `MpasXarray.open_multifile_dataset` allows a bit more flexibility at the |
| 105 | + cost of being a little bit less compact. |
| 106 | + - making several utility functions non-public (it is unclear to me why anyone |
| 107 | + want to call these directly): |
| 108 | + - `_assert_valid_datetimes` |
| 109 | + - `_assert_valid_selections` |
| 110 | + - `_ensure_list` |
| 111 | + - `_get_datetimes` |
| 112 | + - `_remove_repeated_time_index` |
| 113 | + - I have removed the ability to run `mpas_xarray.py` as a script and the associated |
| 114 | + tests. This is on the premise that 1) the test were outdated and would have |
| 115 | + needed to be updated to work with the current code and 2) unit testing in |
| 116 | + `test/test_mpas_xarray.py` takes care of this capability in a better way. |
| 117 | + - I have tried to make variable names a bit more verbose in various places |
| 118 | + (e.g. dataset in place of ds). |
| 119 | + - I have tried to improve the docstrings using a syntax that should be useful |
| 120 | + for generating documentation later on. |
| 121 | + - I have update unit testing to work with the new inerface, notably the |
| 122 | + |
| 123 | +<h2> Implementation: MPAS-Analysis specific functionality should be supported in |
| 124 | +xarray preprossing <br> |
| 125 | +Date last modified: 2017/02/10 <br> |
| 126 | +Contributors: Xylar Asay-Davis |
| 127 | +</h2> |
| 128 | + |
| 129 | +In the same branch as above, I have added two classes, `MpasXarray` in the |
| 130 | +`mpas_xarray` module and `MpasAnalysisXarray` in the new `dataset` module. |
| 131 | +The file structure is as follows: |
| 132 | +``` |
| 133 | +mpas_analysis/shared/ |
| 134 | + - dataset/ |
| 135 | + __init__.py |
| 136 | + MpasAnalysisXarray.py |
| 137 | + utility.py |
| 138 | + - mpas_xarray/ |
| 139 | + __init__.py |
| 140 | + mpas_xarray.py |
| 141 | +``` |
| 142 | +`utility.py` contains a function `open_multifile_dataset` that is similar to |
| 143 | +the one in `mpas_xarray` but with additional arguments needed by analysis: |
| 144 | + - `variable_map`, a map between MPAS and MPAS-Analysis variable names |
| 145 | + - `start_date`, the start date of the analysis |
| 146 | + - `end_date`, the end date of the analysis |
| 147 | + |
| 148 | +The `MapsAnalysisXarray` class overloads the `_preprocess` method in |
| 149 | +`MpasXarray` to map variable names, then call `MpasXarray._preprocess` to |
| 150 | +do the rest of the preprocessing as normal. `open_multifile_dataset` is |
| 151 | +also overloaded to call `MpasXarray.open_multifile_dataset` and then slice |
| 152 | +the `Time` coordinate to only include dates between `start_date` and |
| 153 | +`end_date`. |
| 154 | + |
| 155 | +<h1> Testing </h1> |
| 156 | + |
| 157 | +<h2> Testing and Validation: MPAS-Analysis specific functionality should be supported in |
| 158 | +xarray preprossing <br> |
| 159 | +Date last modified: 2017/02/10 <br> |
| 160 | +Contributors: Xylar Asay-Davis |
| 161 | +</h2> |
| 162 | + |
| 163 | +In [xylar/MPAS-Analysis/variable_mapping_reorg](https://github.com/xylar/MPAS-Analysis/tree/variable_mapping_reorg), |
| 164 | +the unit testing for mpas_xarray has been updated. This includes moving unit testing for |
| 165 | +variable mapping elsewhere. |
| 166 | + |
| 167 | +I have tested, to the extent that I am able, that the results of a G240km ACME test case |
| 168 | +are bit-for-bit identical to those from `develop`. (However, I'm having trouble on my |
| 169 | +laptop with `ncremap` giving me a fill-value related error.) |
| 170 | + |
| 171 | +I will make sure all tests with config files in the `configs/lanl` directory produce |
| 172 | +bit-for-bit results with the current `develop`. |
| 173 | + |
| 174 | +<h2> Testing and Validation: MPAS-Analysis specific functionality should be supported in |
| 175 | +xarray preprossing <br> |
| 176 | +Date last modified: 2017/02/10 <br> |
| 177 | +Contributors: Xylar Asay-Davis |
| 178 | +</h2> |
| 179 | + |
| 180 | +Largely, the same as above. |
| 181 | + |
| 182 | +I have added unit testing for `MpasAnalysisXarray` (mostly via the standalone |
| 183 | +`dataset.utility.open_multifile_dataset` function). These tests ensure that: |
| 184 | + - variable mapping works as expected |
| 185 | + - start and end dates work as expected |
0 commit comments