Reorganize config.tempalate (formerly config.analysis) #86

xylar · 2017-01-27T13:36:26Z

Rename config.analysis to config.template and perform some major conceptual reorganization and cleanup.

Corresponding changes have been made to the analysis scripts and example config files for various machines have been added.

xylar · 2017-01-27T13:38:41Z

@milenaveneziani and @pwolfram, I would like to do some pretty significant cleanup and reorganization of the config file. So far, this PR just shows the changes in the config file itself, not the code changes that would be required to make them happen.

Before I proceed, I wanted to get your feedback.

xylar · 2017-01-27T13:41:11Z

One part of the reorganization is just trying to have more intuitive section names and ordering of sections. For example, I got rid of the [paths] section and instead have paths in hopefully more intuitive categories. Here's the new [input] section as an example:

[input]

# directory containing model results
basedir = /dir/to/model/output
# names of namelist and streams files.  If not in basedir, give full path
ocean_namelist_filename = mpas-o_in
ocean_streams_filename = streams.ocean
seaice_namelist_filename = mpas-cice_in
seaice_streams_filename = streams.cice
...

xylar · 2017-01-27T13:43:00Z

Another part of the reorganization was to eliminate redundancy in subdirectories. Here are a couple of examples:

[output]

# directory where analysis should be written
basedir = /dir/to/analysis/output

# subdirectories within basedir for analysis output
scratch_subdir = scratch
plots_subdir = plots
logs_subdir = logs
clim_subdir = clim
...
ocean_observations]

# directory where ocean observations are stored
basedir = /dir/to/ocean/observations
sst_subdir = SST
sss_subdir = SSS
mld_subdir = MLD

xylar · 2017-01-27T13:44:55Z

A third part of the reorganization is about having a way of more easily switching on and off parts of the analysis:

[output]
...
# a list of which analyses to generate.  Valid names are:
#   'ohc_timeseries', 'sst_timeseries', 'sst_modelvsobs', 'sss_modelvsobs',
#   'mld_modelvsobs', 'seaice_timeseries', 'seaice_modelvsobs'
# the following shortcuts exist:
#   'all' -- all analyses will be run
#   'all_timeseries' -- all time-series analyses will be run
#   'all_modelvsobs' -- all analyses comparing model climatologies with
#                       observatoins will be run
#   'all_ocean' -- all ocean analyses will be run
#   'all_seaice' -- all sea-ice analyses will be run
generate = ['all']

# an alternative example that would perform all analysis except
#   'ohc_timeseries'
#generate = ['sst_timeseries', 'all_modelvsobs', 'all_seaice']

xylar · 2017-01-27T13:50:05Z

Finally, I wanted to have a better way of specifying the reference date for the input data vs. that for the analysis itself. My suggested approach is:

[input]
...
# analysis_ref_date is the starting date of the simulation according to the
#    analysis.  The starting date of the simulation acccording to the input
#    files will be read in from the variable simulationStartTime.  If
#    simulationStartTime is different from analysis_ref_date, an offset will
#    be added to all dates when they are read in.  This would be desirable,
#    for example, if the simulation year 0001 should correspond with analysis
#    year 1850.
#
# If no analysis offset is desired, simply comment out or remove
# analysis_ref_date from this section.
analysis_ref_date = 1850-01-01

[ocean_observations]
...
input_ref_date = 0001-01-01
analysis_ref_date = 1850-01-01

[ocean_reference]
...
input_ref_date = 0001-01-01
analysis_ref_date = 1850-01-01

[seaice_observations]
...
input_ref_date = 0001-01-01
analysis_ref_date = 1850-01-01

[seaice_reference]
...
input_ref_date = 0001-01-01
analysis_ref_date = 1850-01-01

It seems like there's no reason the observations and previous model results couldn't be modified as part of their preprocessing to simply have the dates that the analysis expects. But given that isn't currently the case, it seems like we can support this kind of a thing for each set of observations separately.

This is the change that has the most effect on the code, so this is where I'd particularly like feedback. It also relates to #83, since this would be an alternative (but significantly more involved) fix.

xylar · 2017-01-27T14:08:18Z

Assuming you approve, I would like to do this reorganization before finishing #82 because that PR would rely on the input_ref_date and analysis_ref_date options being available from the changes in PR.

pwolfram · 2017-01-27T15:46:34Z

config.template

+
+# an alternative example that would perform all analysis except
+#   'ohc_timeseries'
+#generate = ['sst_timeseries', 'all_modelvsobs', 'all_seaice']


@xylar, is this going to be an either-or approach? Maybe we could also allow the syntax -ohc_timeseries which would imply skip that analysis, so

generate = ['all', '-ohc_timeseries']

is simply all the analysis but skipping ohc_timeseries. This provides even greater flexibility and clarity.

For full clarity one could require all to be +all, but then this is potentially cumbersome. Another approach is to have a field

skip_generation = ['ohc_timeseries']

which may be clearer, albeit less compact although the code would essentially be doing set operations on these lists either way (obviously post-expansion of terms like all).

@pwolfram, I think your idea is a good one. I wasn't thinking of my example as being a good way of doing all analysis other than ohc_timeseries, just that it happened to work out that way. But I can see how it came across that way.

Anyway, I agree that there needs to be a way of explicitly turning off an analysis function. The only problem I have with the -ohc_timeseries syntax is that I was thinking of having a similar syntax to generate on the command line, for example:

./run_analysis config.analysis --generate sst_timeseries,all_modelvsobs,all_seice

(I'm open to other command line syntax). I think it might be a bit confusing to have the -ohc_timeseries syntax on the command line (or it might fail completely). Would you be open to no_ohc_timeseries (as well as no_ocean, no_timeseries, etc.) as alternatives to the minus sign? This would be consistent with the approach used in run_test.py --no_init_step1 --no_forward syntax used in the MPAS testcases infrastructure (soon to be renamed COMPASS).

pwolfram · 2017-01-27T15:49:53Z

@xylar, this looks good to me and is definitely a step in the right direction. I added a minor comment that can hopefully be used to improve clarity even further.

xylar · 2017-01-28T15:29:22Z

@pwolfram, thanks for the comments. It seems like I should make a design doc for this, which I will try to do in the next few days.

xylar · 2017-01-29T17:07:19Z

#91 is a design document describing this PR.

xylar · 2017-01-29T17:08:06Z

I have now implemented and tested "Requirement: a simple way of turning on and off individual" from #89.

milenaveneziani · 2017-01-31T04:24:50Z

config.template

+# ref_casename is the name of a reference run to compare agains (or None
+# to turn off comparison with a reference, e.g. if no reference case is
+# available)
+ref_casename = None


@xylar: sorry for the delay, I am finally taking a look at this now.
As far as ref_casename goes, I added a '_v0' because we want to keep the reference casename variable different from ACME v0 and v1. We already have a ref_casename to be used to represent a version of ACME greater than 0 (ie, strictly MPAS), so I used ref_casename_v0 for an ACME v0 case (POP based, hence treated as pre-processed data). Mind you, there aren't many v0 simulations we will want to compare to. Here is an exhaustive list:

ACME v0 low-res (1deg atm/land, 1deg POP/CICE)

ACME v0 high-res, pre-industrial (0.25deg atm/land, 0.1deg POP/CICE)

ACME v0 high-res, present-day transients (same resolution as 2)

So, to summarize, we want to keep ref_casename_v0 separate from ref_casename (just to be clear, we are not using the latter, currently, but it will be nice to in the future).

@milenaveneziani, thanks for the clarification here. I think we should discuss this in our call tomorrow.

My feeling is that there should be as little as possible in both the config file and in the analysis itself that is ACME specific. Instead, it might make sense to support analysis for a reference run that has the same formatting at the main run (e.g. a run directory and corresponding namelist and streams files) and also for another preprocessed format, of which v0 runs would be one example. Let's discuss what the appropriate option names would be and which section(s) they would go into.

I see your points @xylar. I am open to renaming things. We could have a section called ocean_reference_nonMPAS or something along those lines.

milenaveneziani · 2017-01-31T04:37:42Z

config.template

-# names of namelist and streams files
+
+# directory containing model results
+basedir = /dir/to/model/output


As a heads up, it may be worthwhile to note here that eventually we will have to support the option that model output has been short-term archived. For the time being, short-term archive is off and all model results and files are in one single output directory. When short-term archiving is on, ocean files will go in one directory, sea-ice files in another, so on so forth. The ACME preAndPostProcessing script now has a flag to indicate whether short-term archiving is on or off, so we could use that in the future to define a basedir directory differently for each model component, if short-term archiving is on.

@milenaveneziani, I think maybe this issue is too big to address in this PR and should be make a separate issue.

Again, I feel strongly that any solution shouldn't be aware of the concept that are ACME-specific such as "short-term archiving". Instead, we need to have a flexibility in finding the appropriate files (both the data sets themselves and their corresponding namelist and streams files). I have a feeling this could be logistically challenging because files may not be in the locations specified relative to the streams files after archiving (unless the namelist and streams files are archived along with the data, which would be very helpful).

I can tell you what CESM used to do (I don't think we ever used short-term archiving with ACME because it does not work for MPAS): all the logs files were saved in a logs/ directory, and all the files needed for restart (and that I suppose will include namelists and streams for MPAS) were saved in a rest/ directory.
So, yes, it will be challenging, or maybe not if we have a separate section for restart files and namleists/streams. But I agree that we shouldn't worry about this in this PR.

Am I correct in understanding that we can support relative path names with respect to basedir? If basedir=/, then we would essentially have to have the full path for all associated and derived paths, correct? If so, this appears to add syntactic sugar without designing away capability.

@pwolfram, yes, I think you're right that you could give basedir = / if you wanted to give the others as absolute paths. I wouldn't expect that to be a frequent use case though.

milenaveneziani · 2017-01-31T04:39:09Z

config.template

-obs_mlddir = /global/project/projectdirs/acme/observations/Ocean/MLD
-obs_seaicedir = /global/project/projectdirs/acme/observations/SeaIce
-ref_archive_v0_ocndir = /global/project/projectdirs/acme/ACMEv0_lowres/B1850C5_ne30_v0.4/ocn/postprocessing
-ref_archive_v0_seaicedir = /global/project/projectdirs/acme/ACMEv0_lowres/B1850C5_ne30_v0.4/ice/postprocessing


again, we want to preserve the '_v0' part here, to differentiate it with a ref_case that is v1 and beyond.

Let's discuss this. I see the need for something equivalent to "v0" but I am opposed to the ACME-specific nature of the name.

milenaveneziani · 2017-01-31T04:44:13Z

config.template

+# Each subsequent list entry can be used to alter previous list entries. For
+# example, the following would produce all analyses except ocean model vs.
+# observations (albeit not in a very intuitive way):
+#generate = ['all', 'no_ocean', 'all_timeseries']


great to have all these options that can be specified at run time!

not sure if this is a good place to do this, but I thought we could start renaming the modelvsobs scripts to something like regriddedHorSlice or interpHorSlice (with a VertSlice counterpart, eventually). And then compare to obs only in the case that obs are actually available for comparison.

@milenaveneziani, let's discuss this rename. Potentially, it's a good time to do such a rename. I'm not a big fan of these shortened names. I much prefer a more robust name where words are written out in full. I also don't think these are typically "slices" in the same sense as our vertical transects so I would pick something more like regridHorizontalField. But I also want to be careful about having analysis names that sound too much like intermediate actions that are part of the analysis. In this case, the purpose of the script (what it actively does) is ultimately to plot something, not to regrid something. So I think these names should get some careful thought.

again, I am very open to choice of names, and I also tend to like long names. How about plotRegridHorizontalField?

Yep something like that would work very well for me. I would suggest plotRegriddedHorizontalField so that there's only one active verb: "What are we plotting? A regridded horizontal field!" ;-)

+1 for names in CamelCase that are more vs less descriptive.

We could use this PR to clean up the formatting of all the variables in the config file using CamelCase. I would be in favor of that. I would not be in favor of expanding this PR to clean up the whole repo to use CamelCase, but we could do that at a later stage, preferably before things progress too far with the current random mix of case conventions.

I agree that we don't want the entire repo to use CamelCase, at least at this point, but it does appear to be useful for the config files.

milenaveneziani · 2017-02-03T17:36:43Z

@xylar: should we add a few config.templates (the one you have works for edison and we could call it config.template.edison), one for LANL IC and one for rhea/OLCF, for example, and add them to this PR? We can then solve #80 at the same time.

milenaveneziani · 2017-02-03T18:02:18Z

README.md

@@ -29,3 +29,8 @@ You can easily install them via the conda command:
 ```
 conda install -c scitools  -c https://conda.anaconda.org/opengeostat numpy scipy matplotlib ipython notebook netCDF4 progressbar vtk cartopy xarray dask bottleneck pyevtk numexpr basemap
 ```
+
+## Running the analysis
+  1. create a configuration file by copying and modifying `config.template` or one of the example files in `configs`


Should we exclude the custom config files in gitignore? A quick google search revealed that we could have something like this in .gitignore:
/config.*
!/config.template

which basically means ignore all config.* but config.template.

milenaveneziani · 2017-02-03T20:35:15Z

@pwolfram: I am working on some examples for LANL machines and rhea.
Small comment: I am not sure we need to add subdirectories to configs/$machinename/ for specific runs (beta0/, etc). Seems like machine subdirectories are enough (and the code information could be added to the config.template name, as in the current example).

milenaveneziani · 2017-02-03T21:30:31Z

run_analysis.py

        print ""
        print "Plotting SST time series..."
        from mpas_analysis.ocean.sst_timeseries import sst_timeseries
        sst_timeseries(config, streamMap=oceanStreamMap,
                       variableMap=oceanVariableMap)

-    if config.getboolean('nino34_timeseries', 'generate'):
+    if checkGenerate(config, analysisName='timeSeriesNino34',
+                     mpasCore='ocean', analysisCategory='timeSeries'):
        print ""
        print "Plotting Nino3.4 time series..."


@xylar: I would comment out the print statements that are associated with analysis that is not yet available, because they may be cause of confusion.

milenaveneziani · 2017-02-03T21:52:50Z

@xylar: I have included two config files for LANL and OLCF machines, and pushed my changes to config_reorg on my fork. I have done some testing with these config files and everything seems to work just fine.
I would like to add one example of config file for an alpha7 run on edison: will do it later and report here.

xylar · 2017-02-04T10:56:00Z

@milenaveneziani, thanks for the comments. I think I have addressed them all and I brought in your two example config files, which look great. I agree that there's no need for beta0 directories. I thought it might make specific configurations easier to find but we can always add subdirectories later if they seem necessary.

I will add a readme to the configs directory making clear that the intended usage is to copy these files to the root and modify them, rather than to modify them in place. We don't want people editing the output baseDirectory, for example, and committing it back to the repo.

xylar · 2017-02-06T14:46:06Z

I just added another config file for LANL. This one is for a small, short run on wolf that I've been using for debugging. I tested the branch there and it ran successfully.

@milenaveneziani, if you would be up for generating the mapping file that would be needed for the sea-ice climatology analysis in that run, we could use this run as a more complete test of the analysis. If you'd rather wait on that, I will eventually (in a few weeks to a month, hopefully) have a version of the interpolation code that doesn't require having a mapping file.

milenaveneziani · 2017-02-06T16:42:59Z

@xylar: I talked with @maltrud about generating the mapping file for the 240. He usually prefers to handle it himself because he also takes care that the (sea-ice) observations are interpolated onto the same grid as the model data.

xylar · 2017-02-06T17:46:45Z

Ah, okay. Sounds good.

xylar · 2017-02-06T20:50:28Z

I just added the preprocessed reference cases to che LANL config files. I re-tested the runs and they work.

pwolfram · 2017-02-08T21:24:46Z

@xylar and @milenaveneziani, LGTM.

xylar · 2017-02-08T21:29:55Z

Thanks, @pwolfram!

When Edison is back up, I'd suggest that @milenaveneziani and I run some tests. Then, I'll squash a bunch of the higher level commits. Then, I think we're ready for @milenaveneziani to do the merge? And probably that of #91 at about the same time?

milenaveneziani · 2017-02-08T21:40:49Z

@xylar: sounds good.
Also, @pwolfram and I just had a good conversation on the phone and we agreed on a short-term plan: 1) I can review most of the time-related PR's, and Phillip will just keep an eye opened to make sure that there isn't something that he would like to double check (#104 could be an example, since it involves mpas_xarray). I will also work on the MOC script; 2) Phillip will concentrate on things that are more xarray related, such as the 'Too many open files' issue he has been working on lately.
Does this sound OK?

Rename config.analysis to config.template and perform some major conceptual reorganization and cleanup. Note: corresponding changes to the code have not been made, so this is simply a concept for motivating the required code changes. Change how analyses are turned on/off ('generate') With this merge, there is a single list 'generate' in the '[output]' section that determines which analyses are run. This option replaces the 'generate' options in in the sections for each specific analysis module. The syntax for generate is described in config.template as follows. 'generate' is a list of which analyses to generate. Valid names are: 'ohc_timeseries', 'sst_timeseries', 'sst_modelvsobs', 'sss_modelvsobs', 'mld_modelvsobs', 'seaice_timeseries', 'seaice_modelvsobs' the following shortcuts exist: 'all' -- all analyses will be run 'all_timeseries' -- all time-series analyses will be run 'all_modelvsobs' -- all analyses comparing model climatologies with observatoins will be run 'all_ocean' -- all ocean analyses will be run 'all_seaice' -- all sea-ice analyses will be run 'no_ohc_timeseries' -- skip the 'ohc_timeseries' (and similarly with the other analyses). 'no_ocean', 'no_timeseries', etc. -- in analogy to 'all_*', skip the given category of analysis With a command line argument `--generate`, a comma-separated list can be used to select which analyses are performed. The command-line argument overrides the config option. Move input/output paths to config sections This merge moves the input path to 'basedir' in [input] and the output path to 'basedir' in [output]. The subpaths in output are no longer full paths, just subpaths off of 'basedir'. A few formatting issues were also cleaned up, primarily changing tabs to spaces. Switch observation and reference sims. to own sections Switch the folders for observations and reference simulations in the config file to their own files (ocean_observations, ocean_reference, etc.). Make 'climatology' and 'time_series' config sections These now contain the bounds in time of their respective analyses. ref_casename_v0 --> ref_casename This change has so far only been made in the config file and where the ref_casename is parsed. More work should be done in the future to make sure other reference cases than ACME v0 can be used here. Update config.template to use CamelCase Some other minor simplifications and reorganization is also included. Update run_analysis to be compatible with new config run_analysis now uses CamelCase throughout. Docstring have been added to utility functions. A new utility function has been added that makes a full path from a base directory and relative path read from a config file. Update analysis and plotting utilities based on new config The ocean.timeseries_ohc and shared.plot.plotting have been updated to be compatible with the new config template. Analysis functions have also been updated to use CamelCase and a few minor formatting and PEP8 issues have been addressed

Add a README in the configs directory

Also mention the `generate` option and command-line flag

xylar · 2017-02-08T22:04:58Z

@milenaveneziani and @pwolfram, that plan sounds good to me!

xylar · 2017-02-08T22:05:44Z

I rebased, so this PR is just awaiting testing on Edison once /project is back up. Obviously that won't happen today (for me at least).

milenaveneziani · 2017-02-10T05:38:08Z

@xylar: I added the correct remap file in the example config file for the GMPAS-240km case on LANL IC machines (hope that was OK with you), and then tested it. Everything worked fine, so I went ahead and merged this.

xylar · 2017-02-10T07:02:36Z

Yep, that's perfect! Thank you, @milenaveneziani.

xylar added clean up in progress enhancement help wanted labels Jan 27, 2017

xylar assigned pwolfram and milenaveneziani Jan 27, 2017

pwolfram reviewed Jan 27, 2017

View reviewed changes

pwolfram mentioned this pull request Jan 27, 2017

Default config.analysis has un-useful default values #80

Closed

This was referenced Jan 28, 2017

Fixes yr_offset issue between obs and model data #83

Closed

Design Document: Config File Reorganization #89

Closed

xylar force-pushed the config_reorg branch 2 times, most recently from 96c0f28 to 2000a9d Compare January 29, 2017 17:06

xylar force-pushed the config_reorg branch from 8ef17af to 39063b0 Compare January 30, 2017 18:04

xylar removed in progress help wanted labels Jan 30, 2017

xylar force-pushed the config_reorg branch from c6e1dc6 to 8ad07a0 Compare January 30, 2017 23:04

milenaveneziani reviewed Jan 31, 2017

View reviewed changes

milenaveneziani reviewed Feb 3, 2017

View reviewed changes

xylar force-pushed the config_reorg branch from 41c90df to 5beb04c Compare February 4, 2017 10:46

xylar mentioned this pull request Feb 6, 2017

A reorganization of timekeeping to work with general calendars #102

Merged

pwolfram mentioned this pull request Feb 8, 2017

Move variable mapping out of mpas_xarray into new generalized_reader module #104

Merged

xylar added 4 commits February 8, 2017 22:55

Add example config files

1599b5f

Add a README in the configs directory

Update readme to include run instructions

b9ffc94

Also mention the `generate` option and command-line flag

Add CI testing for checkGenerate

16f2519

xylar force-pushed the config_reorg branch from 7ecd369 to 16f2519 Compare February 8, 2017 22:02

xylar changed the title ~~Reorganize config.analysis --> config.tempalate~~ Reorganize config.tempalate (formerly config.analysis) Feb 8, 2017

Added correct remap file in example config file for 240km mesh

682cf97

milenaveneziani merged commit 682cf97 into MPAS-Dev:develop Feb 10, 2017

xylar deleted the config_reorg branch February 10, 2017 07:01

xylar mentioned this pull request Feb 10, 2017

All "scripts" must work without user modification #2

Closed

2 tasks

Reorganize config.tempalate (formerly config.analysis) #86

Reorganize config.tempalate (formerly config.analysis) #86

Uh oh!

Conversation

xylar commented Jan 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xylar commented Jan 27, 2017

Uh oh!

xylar commented Jan 27, 2017

Uh oh!

xylar commented Jan 27, 2017

Uh oh!

xylar commented Jan 27, 2017

Uh oh!

xylar commented Jan 27, 2017

Uh oh!

xylar commented Jan 27, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pwolfram commented Jan 27, 2017

Uh oh!

xylar commented Jan 28, 2017

Uh oh!

xylar commented Jan 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xylar commented Jan 29, 2017

Uh oh!

milenaveneziani Jan 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

milenaveneziani commented Feb 3, 2017

Uh oh!

milenaveneziani Feb 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xylar commented Jan 27, 2017 •

edited

Loading

xylar commented Jan 29, 2017 •

edited

Loading

milenaveneziani Jan 31, 2017 •

edited

Loading

milenaveneziani Feb 3, 2017 •

edited

Loading