-
Notifications
You must be signed in to change notification settings - Fork 52
Design Document: Config File Reorganization #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design Document: Config File Reorganization #91
Conversation
I decided not to go too far with changes to ACME-specific options (casename and ref_casename_v0) in this reorganization. I decided to leave `yr_offset` alone for this PR, since I think any changes there are going to need their own separate discussion and PR(s).
Nice! @pwolfram, I think this process will work well for design docs. If you view the changes in my second commit as a "rich diff", you can see the edits like you would with Track Changes in Word. Also, it's easy to edit the markdown directly on GitHub once you've created the branch. |
@xylar, the rich diff view is awesome and is probably what we want anyway! |
Contributors: Xylar Asay-Davis | ||
</h2> | ||
|
||
There should be a simple, intuitive method for turning on and off individual analysis modules (e.g. `ocean/ohc_timeseries`). This should replace the current approach of having a boolean `generate` flag for each analysis module in a separate config section. Preferably, there should be an equivalent method for turning on and off analysis modules from the command line that overrides that in the config file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be an equivalent method for turning on and off analysis modules from the command line that overrides that in the config file.
Do we really need / want this functionality? I can see where it would be useful for testing but I don't think it is essential for 95% of our use cases. I'd skip it unless it is quick / easy to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milenaveneziani, thanks for bringing #14 this back to my attention. It would be good to add in this design if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram and @milenaveneziani, hopefully you both want this functionality because I already implemented it in #86.
Contributors: Xylar Asay-Davis | ||
</h2> | ||
|
||
The current example config file is specific to a run on Edison and should be made into a general template. Simplifications should be made to the template so that it can more easily and intuitively be modified for several analyses. Example config files should also be added for analyzing several existing runs on several different machines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current config is hacked from multiple sources and am not 100% sure if is strictly Edison. Maybe want
The current example config file is specific to a run on Edison and should be made into a general template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think this is pretty nit picky but I'll change it.
Contributors: Xylar Asay-Davis | ||
</h2> | ||
|
||
To the extent possible, ACME-specific config options such as `casename` and `ref_casename_v0` should be generalized in a way that is also appropriate for MPAS-only runs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If done right, MPAS-Analysis could be even more general than MPAS. Basically, we are just pointing the analysis to paths and then plotting variables based on some assumption of grid variables existing. This could potentially be generalized away from MPAS, similar to our renaming of namelist and stream variables, if possible. I'd make the following change
To the extent possible, ACME-specific config options such as casename
and ref_casename_v0
should be generalized in a way that is also appropriate for MPAS-only runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I have no interest in going down that path. This is MPAS analysis and I have no interest in supporting anything more general. This is how you get code that is so generalized it's completely unreadable and useful to no one. We can discuss this philosophy tomorrow but I'm definitely opposed to trying to generalize beyond MPAS and ACME runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the phrase "in a way that is also appropriate for MPAS-only runs." is somewhat ambiguous because there could be different uses for different cores and this assumes some type of particular codified standard, which may or may not exist. I'm not suggesting that we fully generalize this in a way that isn't helpful, just that we avoid unnecessarily hard coding things that don't absolutely have to be hard coded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, fair enough regarding generalization.
I will reword "in a way that is also appropriate for MPAS-only runs" rather than getting rid of it. To me it clarifies what the scope of MPAS-Analysis needs to be: not just ACME runs but also any other runs involving the MPAS components we support.
Contributors: Xylar Asay-Davis | ||
</h2> | ||
|
||
Currently, there is an option `yr_offset` that is intended to apply to all dates and observations. This option should be removed and possibly be replaced by an input and output reference date for each model run and observational data set. Discussion is needed if these reference dates are actually needed once we find a more generalized calendar (e.g. `netcdftime`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm going to take this out of this design doc. I think getting rid of yr_offset
is too big a task for this project. I already removed that change from the template in #86.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to figure out what to do about yr_offset
and related things like that because this appears to be a potential problem that may be causing unnecessary roadblocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely! But not here.
|
||
The following comment describes the planned implementation in the config file. | ||
``` | ||
# a list of which analyses to generate. Valid names are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a list of which analyses to generate.
``` | ||
./run_analysis.py config.analysis --generate all,no_ocean,all_timeseries | ||
``` | ||
(I am open to other syntax.) If the --generate flag is used on the command line, it will replace the generate option in the config file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want the command line options to fully overwrite the config options? If so, should an effective config file be generated for documentation purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the first question, yes, absolutely. How else would the outcome be clear? If you just concatenated the command line options on the end of the list of options in the config file, it would make things much more complex rather than simpler.
Yes to the second one. We should do this anyway because there are already a few config options that are generated in the code if they aren't present in the config file. But this is not an issue for this PR to address in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do this anyway because there are already a few config options that are generated in the code if they aren't present in the config file. But this is not an issue for this PR to address in my opinion.
I think this type of thing needs an issue so that we don't forget that this is something we ultimately need to do to keep the repo clean. Are you opposed to having an issue for output of a provenance-useful config file that can be used to reproduce the analysis? We could potentially check off #23 and or #35 this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that such an issue should be added. Could you take care of adding it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See soon to exist #94
Contributors: Xylar Asay-Davis | ||
</h2> | ||
|
||
This item needs some discussion. in #86, I would like to renamed `ref_casename_v0` to simply `ref_casename`. `casename` and `ref_casename_v0` are used for file names and figure titles (essentially legends). It would be useful to discuss what the relevant equivalents would be fore standalone MPAS runs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously needs discussion in our call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milenaveneziani had a solution that seems reasonable (with a bit of adaptation). We could have runName
for the current run, referenceRunName
for a reference MPAS or ACME run, and preprocessedReferenceRunName
(or something similar) to replace ref_casename_v0
where the data doesn't come from an MPAS run and therefore has been preprocessed.
I am definitely open to alternative solutions but that one works well for me.
Contributors: Xylar Asay-Davis | ||
</h2> | ||
|
||
This will be difficult to test with CI since we currently don't have CI that can perform tests with `run_analysis.py`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, is this something we need, e.g., CI testing for run_analysis.py
? I think it is hard to test a full workflow with output that is a png and don't see any clear way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think that's not really the point of CI anyway. You want to break the code into units as opposed to trying to test the whole thing at once.
Still, until we have some example config files to use as "standard" test runs, our testing will remain pretty arbitrary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, we can use coveralls, landscape, etc to point to where we need to write test if this becomes an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I didn't follow any of that. coveralls
and landscape
are libraries? Could you clarify what you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coveralls and landscape are github "plug-ins" that can be automatically run, like CI, to provide diagnostics regarding code quality. They can indicate parts of the code that may need further testing to provide a robust and maintainable code base. At the present, I'm not suggesting we use them, just keep them in mind if we have needs related to identification of insufficient test in particular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good to know! Something to look into down the road, I agree.
|
||
This will be difficult to test with CI since we currently don't have CI that can perform tests with `run_analysis.py`. | ||
|
||
Instead, I will manually test a variety of combinations of `generate` lists (both in the config file and on the command line). I will list the tests I perform in #86. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to test the generate functionality with example config files in CI. We don't need run_analysis.py
to do this check, just a config file and the parsers for the config file. This seems to be the right way to test and ensure that functionality isn't lost in the long-term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you do need one of the functions out of run_analysis.py
(check_generate
: https://github.com/MPAS-Dev/MPAS-Analysis/pull/86/files#diff-199f356fab800f1dc95e9c7c5567146cR38) but that would be pretty easy to use in some CI testing. I'll add that to the design doc and also to the code in #86 when I can.
Contributors: Xylar Asay-Davis | ||
</h2> | ||
|
||
The analysis will be tested on both MPAS and ACME runs to make sure the resulting image files have useful file names and titles (legends). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will you assess success for this goal? Naming is a subjective process, at least for the best names. This seems like something that will require some group discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, of course!
</h2> | ||
|
||
A consistent convention of capitalization and underscores should be used throughout the config file. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram and @milenaveneziani, could you please let me know if this requirement makes sense to you? If so, I would like to begin to implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me
# option: | ||
# ./run_analysis.py config.analysis --generate \ | ||
# all,no_ocean,all_plotTimeSeries | ||
generate = ['all'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This demonstrates how I'd like to rename the different analysis scripts using CamelCase. @pwolfram and @milenaveneziani, again, I want to make sure you're good with this before I dive into implementation.
</h2> | ||
|
||
Implementation of the `config.template` file can be found [here](https://github.com/xylar/MPAS-Analysis/blob/5d5f64bde6ecf1d71f375a61783ff30f1654df01/config.template). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milenaveneziani and @pwolfram, could you take a careful look at the latest template and make sure you're okay with it?
https://github.com/xylar/MPAS-Analysis/blob/5d5f64bde6ecf1d71f375a61783ff30f1654df01/config.template
I am going to go ahead with implementation of the necessary changes in the code itself today but want to address any concerns as soon as possible before I get too far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments:
For https://github.com/xylar/MPAS-Analysis/blob/5d5f64bde6ecf1d71f375a61783ff30f1654df01/config.template#L157 -- we have the option to compare against a model, which is commented out. I think this is ok for now but just want to make sure we keep that option open for the future if possible.
It would be good to get to the bottom of https://github.com/xylar/MPAS-Analysis/blob/5d5f64bde6ecf1d71f375a61783ff30f1654df01/config.template#L299, but this is not essential.
I'd say go for it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, I'll look into the second issue when I get to working on ocean.modelvsobs
. My guess is the last contour value just isn't being used, which would make the diff plots weirdly asymmetric in color space. I'll update the template somehow if that's the case or ask @vanroekel about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The template looks good, @xylar.
About the first issue @pwolfram brings up: I think compareWithModel could safely go at some point, because we will be relying on preprocessedRunName and referenceRunName, whether they are None or not, to plot results from another model. Let's keep it there for now though.
@pwolfram and @milenaveneziani, I guess this can get merged just after #86, because then we know it's served its purpose and we're not likely to do further editing. |
@xylar and @milenaveneziani, LGTM. |
A design document describing reorganization of the config file for increased clarity and to make it easier to control which analyses are run.