Skip to content

Commit e6ce8d8

Browse files
committed
Design document for MpasXarray class
This design document describes why an MpaxXarray class and an MpasAnalysisXarray subclass are needed for preprocessing xarray datasets.
1 parent 6bb9904 commit e6ce8d8

File tree

1 file changed

+185
-0
lines changed

1 file changed

+185
-0
lines changed

design_docs/mpas_xarray_class.md

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
<h1> Title: Moving variable mapping outside of mpas_xarray <br>
2+
Xylar Asay-Davis <br>
3+
date: 2017/02/10 <br>
4+
</h1>
5+
<h2> Summary </h2>
6+
In discussions with @pwolfram, it became clear that we would like to keep
7+
mpas_xarray as general as possible, rather than adding code specific to
8+
MPAS-Analysis. In particular, the capability for mapping variable names
9+
that is currently part of mpas_xarray is likely a capability that only
10+
MPAS-Analysis will need when opening xarray data sets. Likewise, there is
11+
a desire for mpax_xarray not to use any of the functionality outside of its
12+
own module so that it remains autonomous from MPAS-Analysis.
13+
14+
At the same time, it is desirable for efficiency and parallelism to perform
15+
certain operations during the preprocessing step within xarray, rather than
16+
constructing a data set first and then (in serial) performing manipulations
17+
(e.g. creating a time coordinate and slicing variables).
18+
19+
To solve these problems, this document proposes a solution in which a new
20+
MpasXarray class is created with two methods, `open_multifile_dataset` and
21+
`_preprocess` that can be overloaded by subclasses. A subclass,
22+
MpasAnalysisXarray, is also added in which these methods are overloaded to
23+
add support for (among other things) mapping variables.
24+
25+
The solution will be tested by making sure it produces bit-for-bit identical
26+
results to those from the develop branch for typical test cases on LANL IC
27+
and Edison.
28+
29+
<h1> Requirements </h1>
30+
31+
<h2> Requirement: mpas_xarray does not include MPAS-Analysis specific
32+
functionality <br>
33+
Date last modified: 2017/02/10 <br>
34+
Contributors: Xylar Asay-Davis
35+
</h2>
36+
37+
MPAS-Analysis specific functionality such as variable mapping should be
38+
removed from mpas_xarray so it can remain an independent module, requiring
39+
minimal modification to accommodate MPAS-Analysis' needs.
40+
41+
<h2> Requirement: MPAS-Analysis specific functionality should be supported in
42+
xarray preprossing <br>
43+
Date last modified: 2017/02/10 <br>
44+
Contributors: Xylar Asay-Davis
45+
</h2>
46+
47+
There should be a way to perform MPAS-Analysis specific functionality such as
48+
mapping variables during preprocessing. This functionality should be
49+
relatively easy to add to as new preprocessing needs arise.
50+
51+
52+
<h1> Algorithmic Formulations (optional) </h1>
53+
54+
<h2> Algorithm: mpas_xarray does not include MPAS-Analysis specific
55+
functionality <br>
56+
Date last modified: 2017/02/10 <br>
57+
Contributors: Xylar Asay-Davis
58+
</h2>
59+
60+
All functions and function arguments related to variable mapping will
61+
be removed from mpas_xarray and moved elsewhere.
62+
63+
<h2> Algorithm: MPAS-Analysis specific functionality should be supported in
64+
xarray preprossing <br>
65+
Date last modified: 2017/02/10 <br>
66+
Contributors: Xylar Asay-Davis
67+
</h2>
68+
69+
While this requirement seems almost contradictory to the previous
70+
requirement, both can be accommodated by using classes and inheritance.
71+
An MpasXarray class will be added to mpas_xarray, including a method
72+
(`_preprocess()`) for performing preprocessing. Then, a subclass will
73+
be created (MpasAnalysisXarray) that includes the functionality that
74+
is specific to MPAS-Analysis by overloading the preprocessing method
75+
from MpasXarray.
76+
77+
This solution may add a level of complexity, particularly for
78+
developing mpas_xarray and its analysis-specific derived reader.
79+
It also risks confusing a new developer by having seemingly redundant
80+
functionality in two places. However, it seems to me to be the
81+
cleanest way to satisfy both requirements.
82+
83+
84+
<h1> Design and Implementation </h1>
85+
86+
<h2> Implementation: mpas_xarray does not include MPAS-Analysis specific
87+
functionality <br>
88+
Date last modified: 2017/02/10 <br>
89+
Contributors: Xylar Asay-Davis
90+
</h2>
91+
92+
A test branch can be found here
93+
[xylar/MPAS-Analysis/variable_mapping_reorg](https://github.com/xylar/MPAS-Analysis/tree/variable_mapping_reorg)
94+
95+
I have removed `map_variable` and `rename_variables` from `mpas_xarray`.
96+
I also removed any mention of the variable map from the rest of `mpas_xarray`.
97+
98+
This branch also includes several other cleanup operations that are not
99+
addressing any requirements. These include:
100+
- I added a new helper function, `open_multifile_dataset`, for opening an
101+
xarray data set in a single, simple command without reference to the
102+
preprocessor. This function should make opening new data sets more
103+
intuitive for mpas_xarray users. Calling the
104+
`MpasXarray.open_multifile_dataset` allows a bit more flexibility at the
105+
cost of being a little bit less compact.
106+
- making several utility functions non-public (it is unclear to me why anyone
107+
want to call these directly):
108+
- `_assert_valid_datetimes`
109+
- `_assert_valid_selections`
110+
- `_ensure_list`
111+
- `_get_datetimes`
112+
- `_remove_repeated_time_index`
113+
- I have removed the ability to run `mpas_xarray.py` as a script and the associated
114+
tests. This is on the premise that 1) the test were outdated and would have
115+
needed to be updated to work with the current code and 2) unit testing in
116+
`test/test_mpas_xarray.py` takes care of this capability in a better way.
117+
- I have tried to make variable names a bit more verbose in various places
118+
(e.g. dataset in place of ds).
119+
- I have tried to improve the docstrings using a syntax that should be useful
120+
for generating documentation later on.
121+
- I have update unit testing to work with the new inerface, notably the
122+
123+
<h2> Implementation: MPAS-Analysis specific functionality should be supported in
124+
xarray preprossing <br>
125+
Date last modified: 2017/02/10 <br>
126+
Contributors: Xylar Asay-Davis
127+
</h2>
128+
129+
In the same branch as above, I have added two classes, `MpasXarray` in the
130+
`mpas_xarray` module and `MpasAnalysisXarray` in the new `dataset` module.
131+
The file structure is as follows:
132+
```
133+
mpas_analysis/shared/
134+
- dataset/
135+
__init__.py
136+
MpasAnalysisXarray.py
137+
utility.py
138+
- mpas_xarray/
139+
__init__.py
140+
mpas_xarray.py
141+
```
142+
`utility.py` contains a function `open_multifile_dataset` that is similar to
143+
the one in `mpas_xarray` but with additional arguments needed by analysis:
144+
- `variable_map`, a map between MPAS and MPAS-Analysis variable names
145+
- `start_date`, the start date of the analysis
146+
- `end_date`, the end date of the analysis
147+
148+
The `MapsAnalysisXarray` class overloads the `_preprocess` method in
149+
`MpasXarray` to map variable names, then call `MpasXarray._preprocess` to
150+
do the rest of the preprocessing as normal. `open_multifile_dataset` is
151+
also overloaded to call `MpasXarray.open_multifile_dataset` and then slice
152+
the `Time` coordinate to only include dates between `start_date` and
153+
`end_date`.
154+
155+
<h1> Testing </h1>
156+
157+
<h2> Testing and Validation: MPAS-Analysis specific functionality should be supported in
158+
xarray preprossing <br>
159+
Date last modified: 2017/02/10 <br>
160+
Contributors: Xylar Asay-Davis
161+
</h2>
162+
163+
In [xylar/MPAS-Analysis/variable_mapping_reorg](https://github.com/xylar/MPAS-Analysis/tree/variable_mapping_reorg),
164+
the unit testing for mpas_xarray has been updated. This includes moving unit testing for
165+
variable mapping elsewhere.
166+
167+
I have tested, to the extent that I am able, that the results of a G240km ACME test case
168+
are bit-for-bit identical to those from `develop`. (However, I'm having trouble on my
169+
laptop with `ncremap` giving me a fill-value related error.)
170+
171+
I will make sure all tests with config files in the `configs/lanl` directory produce
172+
bit-for-bit results with the current `develop`.
173+
174+
<h2> Testing and Validation: MPAS-Analysis specific functionality should be supported in
175+
xarray preprossing <br>
176+
Date last modified: 2017/02/10 <br>
177+
Contributors: Xylar Asay-Davis
178+
</h2>
179+
180+
Largely, the same as above.
181+
182+
I have added unit testing for `MpasAnalysisXarray` (mostly via the standalone
183+
`dataset.utility.open_multifile_dataset` function). These tests ensure that:
184+
- variable mapping works as expected
185+
- start and end dates work as expected

0 commit comments

Comments
 (0)