-
Notifications
You must be signed in to change notification settings - Fork 52
Reorganize config.tempalate (formerly config.analysis) #86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@milenaveneziani and @pwolfram, I would like to do some pretty significant cleanup and reorganization of the config file. So far, this PR just shows the changes in the config file itself, not the code changes that would be required to make them happen. Before I proceed, I wanted to get your feedback. |
One part of the reorganization is just trying to have more intuitive section names and ordering of sections. For example, I got rid of the
|
Another part of the reorganization was to eliminate redundancy in subdirectories. Here are a couple of examples:
|
A third part of the reorganization is about having a way of more easily switching on and off parts of the analysis:
|
Finally, I wanted to have a better way of specifying the reference date for the input data vs. that for the analysis itself. My suggested approach is:
It seems like there's no reason the observations and previous model results couldn't be modified as part of their preprocessing to simply have the dates that the analysis expects. But given that isn't currently the case, it seems like we can support this kind of a thing for each set of observations separately. This is the change that has the most effect on the code, so this is where I'd particularly like feedback. It also relates to #83, since this would be an alternative (but significantly more involved) fix. |
Assuming you approve, I would like to do this reorganization before finishing #82 because that PR would rely on the |
config.template
Outdated
|
||
# an alternative example that would perform all analysis except | ||
# 'ohc_timeseries' | ||
#generate = ['sst_timeseries', 'all_modelvsobs', 'all_seaice'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, is this going to be an either-or approach? Maybe we could also allow the syntax -ohc_timeseries
which would imply skip that analysis, so
generate = ['all', '-ohc_timeseries']
is simply all the analysis but skipping ohc_timeseries
. This provides even greater flexibility and clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For full clarity one could require all
to be +all
, but then this is potentially cumbersome. Another approach is to have a field
skip_generation = ['ohc_timeseries']
which may be clearer, albeit less compact although the code would essentially be doing set operations on these lists either way (obviously post-expansion of terms like all
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, I think your idea is a good one. I wasn't thinking of my example as being a good way of doing all analysis other than ohc_timeseries
, just that it happened to work out that way. But I can see how it came across that way.
Anyway, I agree that there needs to be a way of explicitly turning off an analysis function. The only problem I have with the -ohc_timeseries
syntax is that I was thinking of having a similar syntax to generate
on the command line, for example:
./run_analysis config.analysis --generate sst_timeseries,all_modelvsobs,all_seice
(I'm open to other command line syntax). I think it might be a bit confusing to have the -ohc_timeseries
syntax on the command line (or it might fail completely). Would you be open to no_ohc_timeseries
(as well as no_ocean
, no_timeseries
, etc.) as alternatives to the minus sign? This would be consistent with the approach used in run_test.py --no_init_step1 --no_forward
syntax used in the MPAS testcases infrastructure (soon to be renamed COMPASS).
@xylar, this looks good to me and is definitely a step in the right direction. I added a minor comment that can hopefully be used to improve clarity even further. |
@pwolfram, thanks for the comments. It seems like I should make a design doc for this, which I will try to do in the next few days. |
96c0f28
to
2000a9d
Compare
#91 is a design document describing this PR. |
I have now implemented and tested "Requirement: a simple way of turning on and off individual" from #89. |
config.template
Outdated
# ref_casename is the name of a reference run to compare agains (or None | ||
# to turn off comparison with a reference, e.g. if no reference case is | ||
# available) | ||
ref_casename = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar: sorry for the delay, I am finally taking a look at this now.
As far as ref_casename goes, I added a '_v0' because we want to keep the reference casename variable different from ACME v0 and v1. We already have a ref_casename to be used to represent a version of ACME greater than 0 (ie, strictly MPAS), so I used ref_casename_v0 for an ACME v0 case (POP based, hence treated as pre-processed data). Mind you, there aren't many v0 simulations we will want to compare to. Here is an exhaustive list:
- ACME v0 low-res (1deg atm/land, 1deg POP/CICE)
- ACME v0 high-res, pre-industrial (0.25deg atm/land, 0.1deg POP/CICE)
- ACME v0 high-res, present-day transients (same resolution as 2)
So, to summarize, we want to keep ref_casename_v0 separate from ref_casename (just to be clear, we are not using the latter, currently, but it will be nice to in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milenaveneziani, thanks for the clarification here. I think we should discuss this in our call tomorrow.
My feeling is that there should be as little as possible in both the config file and in the analysis itself that is ACME specific. Instead, it might make sense to support analysis for a reference run that has the same formatting at the main run (e.g. a run directory and corresponding namelist and streams files) and also for another preprocessed format, of which v0 runs would be one example. Let's discuss what the appropriate option names would be and which section(s) they would go into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your points @xylar. I am open to renaming things. We could have a section called ocean_reference_nonMPAS or something along those lines.
config.template
Outdated
# names of namelist and streams files | ||
|
||
# directory containing model results | ||
basedir = /dir/to/model/output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a heads up, it may be worthwhile to note here that eventually we will have to support the option that model output has been short-term archived. For the time being, short-term archive is off and all model results and files are in one single output directory. When short-term archiving is on, ocean files will go in one directory, sea-ice files in another, so on so forth. The ACME preAndPostProcessing script now has a flag to indicate whether short-term archiving is on or off, so we could use that in the future to define a basedir directory differently for each model component, if short-term archiving is on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milenaveneziani, I think maybe this issue is too big to address in this PR and should be make a separate issue.
Again, I feel strongly that any solution shouldn't be aware of the concept that are ACME-specific such as "short-term archiving". Instead, we need to have a flexibility in finding the appropriate files (both the data sets themselves and their corresponding namelist and streams files). I have a feeling this could be logistically challenging because files may not be in the locations specified relative to the streams files after archiving (unless the namelist and streams files are archived along with the data, which would be very helpful).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can tell you what CESM used to do (I don't think we ever used short-term archiving with ACME because it does not work for MPAS): all the logs files were saved in a logs/ directory, and all the files needed for restart (and that I suppose will include namelists and streams for MPAS) were saved in a rest/ directory.
So, yes, it will be challenging, or maybe not if we have a separate section for restart files and namleists/streams. But I agree that we shouldn't worry about this in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct in understanding that we can support relative path names with respect to basedir
? If basedir=/
, then we would essentially have to have the full path for all associated and derived paths, correct? If so, this appears to add syntactic sugar without designing away capability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, yes, I think you're right that you could give basedir = /
if you wanted to give the others as absolute paths. I wouldn't expect that to be a frequent use case though.
config.template
Outdated
obs_mlddir = /global/project/projectdirs/acme/observations/Ocean/MLD | ||
obs_seaicedir = /global/project/projectdirs/acme/observations/SeaIce | ||
ref_archive_v0_ocndir = /global/project/projectdirs/acme/ACMEv0_lowres/B1850C5_ne30_v0.4/ocn/postprocessing | ||
ref_archive_v0_seaicedir = /global/project/projectdirs/acme/ACMEv0_lowres/B1850C5_ne30_v0.4/ice/postprocessing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, we want to preserve the '_v0' part here, to differentiate it with a ref_case that is v1 and beyond.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss this. I see the need for something equivalent to "v0" but I am opposed to the ACME-specific nature of the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
config.template
Outdated
# Each subsequent list entry can be used to alter previous list entries. For | ||
# example, the following would produce all analyses except ocean model vs. | ||
# observations (albeit not in a very intuitive way): | ||
#generate = ['all', 'no_ocean', 'all_timeseries'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great to have all these options that can be specified at run time!
not sure if this is a good place to do this, but I thought we could start renaming the modelvsobs scripts to something like regriddedHorSlice or interpHorSlice (with a VertSlice counterpart, eventually). And then compare to obs only in the case that obs are actually available for comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milenaveneziani, let's discuss this rename. Potentially, it's a good time to do such a rename. I'm not a big fan of these shortened names. I much prefer a more robust name where words are written out in full. I also don't think these are typically "slices" in the same sense as our vertical transects so I would pick something more like regridHorizontalField
. But I also want to be careful about having analysis names that sound too much like intermediate actions that are part of the analysis. In this case, the purpose of the script (what it actively does) is ultimately to plot something, not to regrid something. So I think these names should get some careful thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, I am very open to choice of names, and I also tend to like long names. How about plotRegridHorizontalField?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep something like that would work very well for me. I would suggest plotRegriddedHorizontalField
so that there's only one active verb: "What are we plotting? A regridded horizontal field!" ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for names in CamelCase that are more vs less descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use this PR to clean up the formatting of all the variables in the config file using CamelCase. I would be in favor of that. I would not be in favor of expanding this PR to clean up the whole repo to use CamelCase, but we could do that at a later stage, preferably before things progress too far with the current random mix of case conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we don't want the entire repo to use CamelCase, at least at this point, but it does appear to be useful for the config files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
@@ -29,3 +29,8 @@ You can easily install them via the conda command: | |||
``` | |||
conda install -c scitools -c https://conda.anaconda.org/opengeostat numpy scipy matplotlib ipython notebook netCDF4 progressbar vtk cartopy xarray dask bottleneck pyevtk numexpr basemap | |||
``` | |||
|
|||
## Running the analysis | |||
1. create a configuration file by copying and modifying `config.template` or one of the example files in `configs` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we exclude the custom config files in gitignore? A quick google search revealed that we could have something like this in .gitignore:
/config.*
!/config.template
which basically means ignore all config.* but config.template.
@pwolfram: I am working on some examples for LANL machines and rhea. |
run_analysis.py
Outdated
print "" | ||
print "Plotting SST time series..." | ||
from mpas_analysis.ocean.sst_timeseries import sst_timeseries | ||
sst_timeseries(config, streamMap=oceanStreamMap, | ||
variableMap=oceanVariableMap) | ||
|
||
if config.getboolean('nino34_timeseries', 'generate'): | ||
if checkGenerate(config, analysisName='timeSeriesNino34', | ||
mpasCore='ocean', analysisCategory='timeSeries'): | ||
print "" | ||
print "Plotting Nino3.4 time series..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar: I would comment out the print statements that are associated with analysis that is not yet available, because they may be cause of confusion.
@xylar: I have included two config files for LANL and OLCF machines, and pushed my changes to config_reorg on my fork. I have done some testing with these config files and everything seems to work just fine. |
@milenaveneziani, thanks for the comments. I think I have addressed them all and I brought in your two example config files, which look great. I agree that there's no need for beta0 directories. I thought it might make specific configurations easier to find but we can always add subdirectories later if they seem necessary. I will add a readme to the configs directory making clear that the intended usage is to copy these files to the root and modify them, rather than to modify them in place. We don't want people editing the output baseDirectory, for example, and committing it back to the repo. |
I just added another config file for LANL. This one is for a small, short run on wolf that I've been using for debugging. I tested the branch there and it ran successfully. @milenaveneziani, if you would be up for generating the mapping file that would be needed for the sea-ice climatology analysis in that run, we could use this run as a more complete test of the analysis. If you'd rather wait on that, I will eventually (in a few weeks to a month, hopefully) have a version of the interpolation code that doesn't require having a mapping file. |
Ah, okay. Sounds good. |
I just added the preprocessed reference cases to che LANL config files. I re-tested the runs and they work. |
@xylar and @milenaveneziani, LGTM. |
Thanks, @pwolfram! When Edison is back up, I'd suggest that @milenaveneziani and I run some tests. Then, I'll squash a bunch of the higher level commits. Then, I think we're ready for @milenaveneziani to do the merge? And probably that of #91 at about the same time? |
@xylar: sounds good. |
Rename config.analysis to config.template and perform some major conceptual reorganization and cleanup. Note: corresponding changes to the code have not been made, so this is simply a concept for motivating the required code changes. Change how analyses are turned on/off ('generate') With this merge, there is a single list 'generate' in the '[output]' section that determines which analyses are run. This option replaces the 'generate' options in in the sections for each specific analysis module. The syntax for generate is described in config.template as follows. 'generate' is a list of which analyses to generate. Valid names are: 'ohc_timeseries', 'sst_timeseries', 'sst_modelvsobs', 'sss_modelvsobs', 'mld_modelvsobs', 'seaice_timeseries', 'seaice_modelvsobs' the following shortcuts exist: 'all' -- all analyses will be run 'all_timeseries' -- all time-series analyses will be run 'all_modelvsobs' -- all analyses comparing model climatologies with observatoins will be run 'all_ocean' -- all ocean analyses will be run 'all_seaice' -- all sea-ice analyses will be run 'no_ohc_timeseries' -- skip the 'ohc_timeseries' (and similarly with the other analyses). 'no_ocean', 'no_timeseries', etc. -- in analogy to 'all_*', skip the given category of analysis With a command line argument `--generate`, a comma-separated list can be used to select which analyses are performed. The command-line argument overrides the config option. Move input/output paths to config sections This merge moves the input path to 'basedir' in [input] and the output path to 'basedir' in [output]. The subpaths in output are no longer full paths, just subpaths off of 'basedir'. A few formatting issues were also cleaned up, primarily changing tabs to spaces. Switch observation and reference sims. to own sections Switch the folders for observations and reference simulations in the config file to their own files (ocean_observations, ocean_reference, etc.). Make 'climatology' and 'time_series' config sections These now contain the bounds in time of their respective analyses. ref_casename_v0 --> ref_casename This change has so far only been made in the config file and where the ref_casename is parsed. More work should be done in the future to make sure other reference cases than ACME v0 can be used here. Update config.template to use CamelCase Some other minor simplifications and reorganization is also included. Update run_analysis to be compatible with new config run_analysis now uses CamelCase throughout. Docstring have been added to utility functions. A new utility function has been added that makes a full path from a base directory and relative path read from a config file. Update analysis and plotting utilities based on new config The ocean.timeseries_ohc and shared.plot.plotting have been updated to be compatible with the new config template. Analysis functions have also been updated to use CamelCase and a few minor formatting and PEP8 issues have been addressed
Add a README in the configs directory
Also mention the `generate` option and command-line flag
@milenaveneziani and @pwolfram, that plan sounds good to me! |
I rebased, so this PR is just awaiting testing on Edison once /project is back up. Obviously that won't happen today (for me at least). |
@xylar: I added the correct remap file in the example config file for the GMPAS-240km case on LANL IC machines (hope that was OK with you), and then tested it. Everything worked fine, so I went ahead and merged this. |
Yep, that's perfect! Thank you, @milenaveneziani. |
Rename config.analysis to config.template and perform some major conceptual reorganization and cleanup.
Corresponding changes have been made to the analysis scripts and example config files for various machines have been added.