Skip to content

Design Document: Config File Reorganization #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 10, 2017

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Jan 30, 2017

A design document describing reorganization of the config file for increased clarity and to make it easier to control which analyses are run.

I decided not to go too far with changes to ACME-specific options (casename and ref_casename_v0) in this reorganization.

I decided to leave `yr_offset` alone for this PR, since I think any changes there are going to need their own separate discussion and PR(s).
@xylar
Copy link
Collaborator Author

xylar commented Jan 30, 2017

Nice! @pwolfram, I think this process will work well for design docs. If you view the changes in my second commit as a "rich diff", you can see the edits like you would with Track Changes in Word. Also, it's easy to edit the markdown directly on GitHub once you've created the branch.

@pwolfram
Copy link
Contributor

@xylar, the rich diff view is awesome and is probably what we want anyway!
screenshot 2017-01-31 10 56 24

Contributors: Xylar Asay-Davis
</h2>

There should be a simple, intuitive method for turning on and off individual analysis modules (e.g. `ocean/ohc_timeseries`). This should replace the current approach of having a boolean `generate` flag for each analysis module in a separate config section. Preferably, there should be an equivalent method for turning on and off analysis modules from the command line that overrides that in the config file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be an equivalent method for turning on and off analysis modules from the command line that overrides that in the config file.

Do we really need / want this functionality? I can see where it would be useful for testing but I don't think it is essential for 95% of our use cases. I'd skip it unless it is quick / easy to add.

Copy link
Collaborator

@milenaveneziani milenaveneziani Jan 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram: I think this was one of the ideas @xylar had to fully resolve #14. I think it would be nice to have it.
btw, the rich view thing is very nice, but then these review comments disappear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milenaveneziani, thanks for bringing #14 this back to my attention. It would be good to add in this design if possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram and @milenaveneziani, hopefully you both want this functionality because I already implemented it in #86.

Contributors: Xylar Asay-Davis
</h2>

The current example config file is specific to a run on Edison and should be made into a general template. Simplifications should be made to the template so that it can more easily and intuitively be modified for several analyses. Example config files should also be added for analyzing several existing runs on several different machines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current config is hacked from multiple sources and am not 100% sure if is strictly Edison. Maybe want

The current example config file is specific to a run on Edison and should be made into a general template.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think this is pretty nit picky but I'll change it.

Contributors: Xylar Asay-Davis
</h2>

To the extent possible, ACME-specific config options such as `casename` and `ref_casename_v0` should be generalized in a way that is also appropriate for MPAS-only runs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If done right, MPAS-Analysis could be even more general than MPAS. Basically, we are just pointing the analysis to paths and then plotting variables based on some assumption of grid variables existing. This could potentially be generalized away from MPAS, similar to our renaming of namelist and stream variables, if possible. I'd make the following change

To the extent possible, ACME-specific config options such as casename and ref_casename_v0 should be generalized in a way that is also appropriate for MPAS-only runs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I have no interest in going down that path. This is MPAS analysis and I have no interest in supporting anything more general. This is how you get code that is so generalized it's completely unreadable and useful to no one. We can discuss this philosophy tomorrow but I'm definitely opposed to trying to generalize beyond MPAS and ACME runs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the phrase "in a way that is also appropriate for MPAS-only runs." is somewhat ambiguous because there could be different uses for different cores and this assumes some type of particular codified standard, which may or may not exist. I'm not suggesting that we fully generalize this in a way that isn't helpful, just that we avoid unnecessarily hard coding things that don't absolutely have to be hard coded.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, fair enough regarding generalization.

I will reword "in a way that is also appropriate for MPAS-only runs" rather than getting rid of it. To me it clarifies what the scope of MPAS-Analysis needs to be: not just ACME runs but also any other runs involving the MPAS components we support.

Contributors: Xylar Asay-Davis
</h2>

Currently, there is an option `yr_offset` that is intended to apply to all dates and observations. This option should be removed and possibly be replaced by an input and output reference date for each model run and observational data set. Discussion is needed if these reference dates are actually needed once we find a more generalized calendar (e.g. `netcdftime`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm going to take this out of this design doc. I think getting rid of yr_offset is too big a task for this project. I already removed that change from the template in #86.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to figure out what to do about yr_offset and related things like that because this appears to be a potential problem that may be causing unnecessary roadblocks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely! But not here.


The following comment describes the planned implementation in the config file.
```
# a list of which analyses to generate. Valid names are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a list of which analyses to generate.

```
./run_analysis.py config.analysis --generate all,no_ocean,all_timeseries
```
(I am open to other syntax.) If the --generate flag is used on the command line, it will replace the generate option in the config file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the command line options to fully overwrite the config options? If so, should an effective config file be generated for documentation purposes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the first question, yes, absolutely. How else would the outcome be clear? If you just concatenated the command line options on the end of the list of options in the config file, it would make things much more complex rather than simpler.

Yes to the second one. We should do this anyway because there are already a few config options that are generated in the code if they aren't present in the config file. But this is not an issue for this PR to address in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do this anyway because there are already a few config options that are generated in the code if they aren't present in the config file. But this is not an issue for this PR to address in my opinion.

I think this type of thing needs an issue so that we don't forget that this is something we ultimately need to do to keep the repo clean. Are you opposed to having an issue for output of a provenance-useful config file that can be used to reproduce the analysis? We could potentially check off #23 and or #35 this way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that such an issue should be added. Could you take care of adding it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See soon to exist #94

Contributors: Xylar Asay-Davis
</h2>

This item needs some discussion. in #86, I would like to renamed `ref_casename_v0` to simply `ref_casename`. `casename` and `ref_casename_v0` are used for file names and figure titles (essentially legends). It would be useful to discuss what the relevant equivalents would be fore standalone MPAS runs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously needs discussion in our call

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milenaveneziani had a solution that seems reasonable (with a bit of adaptation). We could have runName for the current run, referenceRunName for a reference MPAS or ACME run, and preprocessedReferenceRunName (or something similar) to replace ref_casename_v0 where the data doesn't come from an MPAS run and therefore has been preprocessed.

I am definitely open to alternative solutions but that one works well for me.

Contributors: Xylar Asay-Davis
</h2>

This will be difficult to test with CI since we currently don't have CI that can perform tests with `run_analysis.py`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, is this something we need, e.g., CI testing for run_analysis.py? I think it is hard to test a full workflow with output that is a png and don't see any clear way to do this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think that's not really the point of CI anyway. You want to break the code into units as opposed to trying to test the whole thing at once.

Still, until we have some example config files to use as "standard" test runs, our testing will remain pretty arbitrary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, we can use coveralls, landscape, etc to point to where we need to write test if this becomes an issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I didn't follow any of that. coveralls and landscape are libraries? Could you clarify what you have in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coveralls and landscape are github "plug-ins" that can be automatically run, like CI, to provide diagnostics regarding code quality. They can indicate parts of the code that may need further testing to provide a robust and maintainable code base. At the present, I'm not suggesting we use them, just keep them in mind if we have needs related to identification of insufficient test in particular.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good to know! Something to look into down the road, I agree.


This will be difficult to test with CI since we currently don't have CI that can perform tests with `run_analysis.py`.

Instead, I will manually test a variety of combinations of `generate` lists (both in the config file and on the command line). I will list the tests I perform in #86.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to test the generate functionality with example config files in CI. We don't need run_analysis.py to do this check, just a config file and the parsers for the config file. This seems to be the right way to test and ensure that functionality isn't lost in the long-term.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you do need one of the functions out of run_analysis.py (check_generate: https://github.com/MPAS-Dev/MPAS-Analysis/pull/86/files#diff-199f356fab800f1dc95e9c7c5567146cR38) but that would be pretty easy to use in some CI testing. I'll add that to the design doc and also to the code in #86 when I can.

Contributors: Xylar Asay-Davis
</h2>

The analysis will be tested on both MPAS and ACME runs to make sure the resulting image files have useful file names and titles (legends).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will you assess success for this goal? Naming is a subjective process, at least for the best names. This seems like something that will require some group discussion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course!

</h2>

A consistent convention of capitalization and underscores should be used throughout the config file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram and @milenaveneziani, could you please let me know if this requirement makes sense to you? If so, I would like to begin to implement it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me

# option:
# ./run_analysis.py config.analysis --generate \
# all,no_ocean,all_plotTimeSeries
generate = ['all']
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This demonstrates how I'd like to rename the different analysis scripts using CamelCase. @pwolfram and @milenaveneziani, again, I want to make sure you're good with this before I dive into implementation.

</h2>

Implementation of the `config.template` file can be found [here](https://github.com/xylar/MPAS-Analysis/blob/5d5f64bde6ecf1d71f375a61783ff30f1654df01/config.template).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milenaveneziani and @pwolfram, could you take a careful look at the latest template and make sure you're okay with it?
https://github.com/xylar/MPAS-Analysis/blob/5d5f64bde6ecf1d71f375a61783ff30f1654df01/config.template
I am going to go ahead with implementation of the necessary changes in the code itself today but want to address any concerns as soon as possible before I get too far.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments:

For https://github.com/xylar/MPAS-Analysis/blob/5d5f64bde6ecf1d71f375a61783ff30f1654df01/config.template#L157 -- we have the option to compare against a model, which is commented out. I think this is ok for now but just want to make sure we keep that option open for the future if possible.

It would be good to get to the bottom of https://github.com/xylar/MPAS-Analysis/blob/5d5f64bde6ecf1d71f375a61783ff30f1654df01/config.template#L299, but this is not essential.

I'd say go for it!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram, I'll look into the second issue when I get to working on ocean.modelvsobs. My guess is the last contour value just isn't being used, which would make the diff plots weirdly asymmetric in color space. I'll update the template somehow if that's the case or ask @vanroekel about it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The template looks good, @xylar.
About the first issue @pwolfram brings up: I think compareWithModel could safely go at some point, because we will be relying on preprocessedRunName and referenceRunName, whether they are None or not, to plot results from another model. Let's keep it there for now though.

@xylar
Copy link
Collaborator Author

xylar commented Feb 2, 2017

@pwolfram and @milenaveneziani, I guess this can get merged just after #86, because then we know it's served its purpose and we're not likely to do further editing.

@pwolfram
Copy link
Contributor

pwolfram commented Feb 8, 2017

@xylar and @milenaveneziani, LGTM.

@milenaveneziani milenaveneziani merged commit 18bb5d1 into MPAS-Dev:develop Feb 10, 2017
@xylar xylar deleted the design_doc_config_reorg branch February 23, 2017 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants