Skip to content

Reorganize config.tempalate (formerly config.analysis) #86

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 10, 2017

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Jan 27, 2017

Rename config.analysis to config.template and perform some major conceptual reorganization and cleanup.

Corresponding changes have been made to the analysis scripts and example config files for various machines have been added.

@xylar
Copy link
Collaborator Author

xylar commented Jan 27, 2017

@milenaveneziani and @pwolfram, I would like to do some pretty significant cleanup and reorganization of the config file. So far, this PR just shows the changes in the config file itself, not the code changes that would be required to make them happen.

Before I proceed, I wanted to get your feedback.

@xylar
Copy link
Collaborator Author

xylar commented Jan 27, 2017

One part of the reorganization is just trying to have more intuitive section names and ordering of sections. For example, I got rid of the [paths] section and instead have paths in hopefully more intuitive categories. Here's the new [input] section as an example:

[input]

# directory containing model results
basedir = /dir/to/model/output
# names of namelist and streams files.  If not in basedir, give full path
ocean_namelist_filename = mpas-o_in
ocean_streams_filename = streams.ocean
seaice_namelist_filename = mpas-cice_in
seaice_streams_filename = streams.cice
...

@xylar
Copy link
Collaborator Author

xylar commented Jan 27, 2017

Another part of the reorganization was to eliminate redundancy in subdirectories. Here are a couple of examples:

[output]

# directory where analysis should be written
basedir = /dir/to/analysis/output

# subdirectories within basedir for analysis output
scratch_subdir = scratch
plots_subdir = plots
logs_subdir = logs
clim_subdir = clim
...
ocean_observations]

# directory where ocean observations are stored
basedir = /dir/to/ocean/observations
sst_subdir = SST
sss_subdir = SSS
mld_subdir = MLD

@xylar
Copy link
Collaborator Author

xylar commented Jan 27, 2017

A third part of the reorganization is about having a way of more easily switching on and off parts of the analysis:

[output]
...
# a list of which analyses to generate.  Valid names are:
#   'ohc_timeseries', 'sst_timeseries', 'sst_modelvsobs', 'sss_modelvsobs',
#   'mld_modelvsobs', 'seaice_timeseries', 'seaice_modelvsobs'
# the following shortcuts exist:
#   'all' -- all analyses will be run
#   'all_timeseries' -- all time-series analyses will be run
#   'all_modelvsobs' -- all analyses comparing model climatologies with
#                       observatoins will be run
#   'all_ocean' -- all ocean analyses will be run
#   'all_seaice' -- all sea-ice analyses will be run
generate = ['all']

# an alternative example that would perform all analysis except
#   'ohc_timeseries'
#generate = ['sst_timeseries', 'all_modelvsobs', 'all_seaice']

@xylar
Copy link
Collaborator Author

xylar commented Jan 27, 2017

Finally, I wanted to have a better way of specifying the reference date for the input data vs. that for the analysis itself. My suggested approach is:

[input]
...
# analysis_ref_date is the starting date of the simulation according to the
#    analysis.  The starting date of the simulation acccording to the input
#    files will be read in from the variable simulationStartTime.  If
#    simulationStartTime is different from analysis_ref_date, an offset will
#    be added to all dates when they are read in.  This would be desirable,
#    for example, if the simulation year 0001 should correspond with analysis
#    year 1850.
#
# If no analysis offset is desired, simply comment out or remove
# analysis_ref_date from this section.
analysis_ref_date = 1850-01-01

[ocean_observations]
...
input_ref_date = 0001-01-01
analysis_ref_date = 1850-01-01

[ocean_reference]
...
input_ref_date = 0001-01-01
analysis_ref_date = 1850-01-01

[seaice_observations]
...
input_ref_date = 0001-01-01
analysis_ref_date = 1850-01-01

[seaice_reference]
...
input_ref_date = 0001-01-01
analysis_ref_date = 1850-01-01

It seems like there's no reason the observations and previous model results couldn't be modified as part of their preprocessing to simply have the dates that the analysis expects. But given that isn't currently the case, it seems like we can support this kind of a thing for each set of observations separately.

This is the change that has the most effect on the code, so this is where I'd particularly like feedback. It also relates to #83, since this would be an alternative (but significantly more involved) fix.

@xylar
Copy link
Collaborator Author

xylar commented Jan 27, 2017

Assuming you approve, I would like to do this reorganization before finishing #82 because that PR would rely on the input_ref_date and analysis_ref_date options being available from the changes in PR.

config.template Outdated

# an alternative example that would perform all analysis except
# 'ohc_timeseries'
#generate = ['sst_timeseries', 'all_modelvsobs', 'all_seaice']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, is this going to be an either-or approach? Maybe we could also allow the syntax -ohc_timeseries which would imply skip that analysis, so

generate = ['all', '-ohc_timeseries']

is simply all the analysis but skipping ohc_timeseries. This provides even greater flexibility and clarity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For full clarity one could require all to be +all, but then this is potentially cumbersome. Another approach is to have a field

skip_generation = ['ohc_timeseries']

which may be clearer, albeit less compact although the code would essentially be doing set operations on these lists either way (obviously post-expansion of terms like all).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram, I think your idea is a good one. I wasn't thinking of my example as being a good way of doing all analysis other than ohc_timeseries, just that it happened to work out that way. But I can see how it came across that way.

Anyway, I agree that there needs to be a way of explicitly turning off an analysis function. The only problem I have with the -ohc_timeseries syntax is that I was thinking of having a similar syntax to generate on the command line, for example:

./run_analysis config.analysis --generate sst_timeseries,all_modelvsobs,all_seice

(I'm open to other command line syntax). I think it might be a bit confusing to have the -ohc_timeseries syntax on the command line (or it might fail completely). Would you be open to no_ohc_timeseries (as well as no_ocean, no_timeseries, etc.) as alternatives to the minus sign? This would be consistent with the approach used in run_test.py --no_init_step1 --no_forward syntax used in the MPAS testcases infrastructure (soon to be renamed COMPASS).

@pwolfram
Copy link
Contributor

@xylar, this looks good to me and is definitely a step in the right direction. I added a minor comment that can hopefully be used to improve clarity even further.

@xylar
Copy link
Collaborator Author

xylar commented Jan 28, 2017

@pwolfram, thanks for the comments. It seems like I should make a design doc for this, which I will try to do in the next few days.

@xylar
Copy link
Collaborator Author

xylar commented Jan 29, 2017

#91 is a design document describing this PR.

@xylar
Copy link
Collaborator Author

xylar commented Jan 29, 2017

I have now implemented and tested "Requirement: a simple way of turning on and off individual" from #89.

config.template Outdated
# ref_casename is the name of a reference run to compare agains (or None
# to turn off comparison with a reference, e.g. if no reference case is
# available)
ref_casename = None
Copy link
Collaborator

@milenaveneziani milenaveneziani Jan 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar: sorry for the delay, I am finally taking a look at this now.
As far as ref_casename goes, I added a '_v0' because we want to keep the reference casename variable different from ACME v0 and v1. We already have a ref_casename to be used to represent a version of ACME greater than 0 (ie, strictly MPAS), so I used ref_casename_v0 for an ACME v0 case (POP based, hence treated as pre-processed data). Mind you, there aren't many v0 simulations we will want to compare to. Here is an exhaustive list:

  1. ACME v0 low-res (1deg atm/land, 1deg POP/CICE)
  2. ACME v0 high-res, pre-industrial (0.25deg atm/land, 0.1deg POP/CICE)
  3. ACME v0 high-res, present-day transients (same resolution as 2)

So, to summarize, we want to keep ref_casename_v0 separate from ref_casename (just to be clear, we are not using the latter, currently, but it will be nice to in the future).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milenaveneziani, thanks for the clarification here. I think we should discuss this in our call tomorrow.

My feeling is that there should be as little as possible in both the config file and in the analysis itself that is ACME specific. Instead, it might make sense to support analysis for a reference run that has the same formatting at the main run (e.g. a run directory and corresponding namelist and streams files) and also for another preprocessed format, of which v0 runs would be one example. Let's discuss what the appropriate option names would be and which section(s) they would go into.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your points @xylar. I am open to renaming things. We could have a section called ocean_reference_nonMPAS or something along those lines.

config.template Outdated
# names of namelist and streams files

# directory containing model results
basedir = /dir/to/model/output
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a heads up, it may be worthwhile to note here that eventually we will have to support the option that model output has been short-term archived. For the time being, short-term archive is off and all model results and files are in one single output directory. When short-term archiving is on, ocean files will go in one directory, sea-ice files in another, so on so forth. The ACME preAndPostProcessing script now has a flag to indicate whether short-term archiving is on or off, so we could use that in the future to define a basedir directory differently for each model component, if short-term archiving is on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milenaveneziani, I think maybe this issue is too big to address in this PR and should be make a separate issue.

Again, I feel strongly that any solution shouldn't be aware of the concept that are ACME-specific such as "short-term archiving". Instead, we need to have a flexibility in finding the appropriate files (both the data sets themselves and their corresponding namelist and streams files). I have a feeling this could be logistically challenging because files may not be in the locations specified relative to the streams files after archiving (unless the namelist and streams files are archived along with the data, which would be very helpful).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can tell you what CESM used to do (I don't think we ever used short-term archiving with ACME because it does not work for MPAS): all the logs files were saved in a logs/ directory, and all the files needed for restart (and that I suppose will include namelists and streams for MPAS) were saved in a rest/ directory.
So, yes, it will be challenging, or maybe not if we have a separate section for restart files and namleists/streams. But I agree that we shouldn't worry about this in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in understanding that we can support relative path names with respect to basedir? If basedir=/, then we would essentially have to have the full path for all associated and derived paths, correct? If so, this appears to add syntactic sugar without designing away capability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram, yes, I think you're right that you could give basedir = / if you wanted to give the others as absolute paths. I wouldn't expect that to be a frequent use case though.

config.template Outdated
obs_mlddir = /global/project/projectdirs/acme/observations/Ocean/MLD
obs_seaicedir = /global/project/projectdirs/acme/observations/SeaIce
ref_archive_v0_ocndir = /global/project/projectdirs/acme/ACMEv0_lowres/B1850C5_ne30_v0.4/ocn/postprocessing
ref_archive_v0_seaicedir = /global/project/projectdirs/acme/ACMEv0_lowres/B1850C5_ne30_v0.4/ice/postprocessing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, we want to preserve the '_v0' part here, to differentiate it with a ref_case that is v1 and beyond.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this. I see the need for something equivalent to "v0" but I am opposed to the ACME-specific nature of the name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

config.template Outdated
# Each subsequent list entry can be used to alter previous list entries. For
# example, the following would produce all analyses except ocean model vs.
# observations (albeit not in a very intuitive way):
#generate = ['all', 'no_ocean', 'all_timeseries']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great to have all these options that can be specified at run time!

not sure if this is a good place to do this, but I thought we could start renaming the modelvsobs scripts to something like regriddedHorSlice or interpHorSlice (with a VertSlice counterpart, eventually). And then compare to obs only in the case that obs are actually available for comparison.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milenaveneziani, let's discuss this rename. Potentially, it's a good time to do such a rename. I'm not a big fan of these shortened names. I much prefer a more robust name where words are written out in full. I also don't think these are typically "slices" in the same sense as our vertical transects so I would pick something more like regridHorizontalField. But I also want to be careful about having analysis names that sound too much like intermediate actions that are part of the analysis. In this case, the purpose of the script (what it actively does) is ultimately to plot something, not to regrid something. So I think these names should get some careful thought.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, I am very open to choice of names, and I also tend to like long names. How about plotRegridHorizontalField?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep something like that would work very well for me. I would suggest plotRegriddedHorizontalField so that there's only one active verb: "What are we plotting? A regridded horizontal field!" ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for names in CamelCase that are more vs less descriptive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use this PR to clean up the formatting of all the variables in the config file using CamelCase. I would be in favor of that. I would not be in favor of expanding this PR to clean up the whole repo to use CamelCase, but we could do that at a later stage, preferably before things progress too far with the current random mix of case conventions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we don't want the entire repo to use CamelCase, at least at this point, but it does appear to be useful for the config files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@milenaveneziani
Copy link
Collaborator

@xylar: should we add a few config.templates (the one you have works for edison and we could call it config.template.edison), one for LANL IC and one for rhea/OLCF, for example, and add them to this PR? We can then solve #80 at the same time.

@@ -29,3 +29,8 @@ You can easily install them via the conda command:
```
conda install -c scitools -c https://conda.anaconda.org/opengeostat numpy scipy matplotlib ipython notebook netCDF4 progressbar vtk cartopy xarray dask bottleneck pyevtk numexpr basemap
```

## Running the analysis
1. create a configuration file by copying and modifying `config.template` or one of the example files in `configs`
Copy link
Collaborator

@milenaveneziani milenaveneziani Feb 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we exclude the custom config files in gitignore? A quick google search revealed that we could have something like this in .gitignore:
/config.*
!/config.template

which basically means ignore all config.* but config.template.

@milenaveneziani
Copy link
Collaborator

@pwolfram: I am working on some examples for LANL machines and rhea.
Small comment: I am not sure we need to add subdirectories to configs/$machinename/ for specific runs (beta0/, etc). Seems like machine subdirectories are enough (and the code information could be added to the config.template name, as in the current example).

run_analysis.py Outdated
print ""
print "Plotting SST time series..."
from mpas_analysis.ocean.sst_timeseries import sst_timeseries
sst_timeseries(config, streamMap=oceanStreamMap,
variableMap=oceanVariableMap)

if config.getboolean('nino34_timeseries', 'generate'):
if checkGenerate(config, analysisName='timeSeriesNino34',
mpasCore='ocean', analysisCategory='timeSeries'):
print ""
print "Plotting Nino3.4 time series..."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar: I would comment out the print statements that are associated with analysis that is not yet available, because they may be cause of confusion.

@milenaveneziani
Copy link
Collaborator

@xylar: I have included two config files for LANL and OLCF machines, and pushed my changes to config_reorg on my fork. I have done some testing with these config files and everything seems to work just fine.
I would like to add one example of config file for an alpha7 run on edison: will do it later and report here.

@xylar
Copy link
Collaborator Author

xylar commented Feb 4, 2017

@milenaveneziani, thanks for the comments. I think I have addressed them all and I brought in your two example config files, which look great. I agree that there's no need for beta0 directories. I thought it might make specific configurations easier to find but we can always add subdirectories later if they seem necessary.

I will add a readme to the configs directory making clear that the intended usage is to copy these files to the root and modify them, rather than to modify them in place. We don't want people editing the output baseDirectory, for example, and committing it back to the repo.

@xylar
Copy link
Collaborator Author

xylar commented Feb 6, 2017

I just added another config file for LANL. This one is for a small, short run on wolf that I've been using for debugging. I tested the branch there and it ran successfully.

@milenaveneziani, if you would be up for generating the mapping file that would be needed for the sea-ice climatology analysis in that run, we could use this run as a more complete test of the analysis. If you'd rather wait on that, I will eventually (in a few weeks to a month, hopefully) have a version of the interpolation code that doesn't require having a mapping file.

@milenaveneziani
Copy link
Collaborator

@xylar: I talked with @maltrud about generating the mapping file for the 240. He usually prefers to handle it himself because he also takes care that the (sea-ice) observations are interpolated onto the same grid as the model data.

@xylar
Copy link
Collaborator Author

xylar commented Feb 6, 2017

Ah, okay. Sounds good.

@xylar
Copy link
Collaborator Author

xylar commented Feb 6, 2017

I just added the preprocessed reference cases to che LANL config files. I re-tested the runs and they work.

@pwolfram
Copy link
Contributor

pwolfram commented Feb 8, 2017

@xylar and @milenaveneziani, LGTM.

@xylar
Copy link
Collaborator Author

xylar commented Feb 8, 2017

Thanks, @pwolfram!

When Edison is back up, I'd suggest that @milenaveneziani and I run some tests. Then, I'll squash a bunch of the higher level commits. Then, I think we're ready for @milenaveneziani to do the merge? And probably that of #91 at about the same time?

@milenaveneziani
Copy link
Collaborator

@xylar: sounds good.
Also, @pwolfram and I just had a good conversation on the phone and we agreed on a short-term plan: 1) I can review most of the time-related PR's, and Phillip will just keep an eye opened to make sure that there isn't something that he would like to double check (#104 could be an example, since it involves mpas_xarray). I will also work on the MOC script; 2) Phillip will concentrate on things that are more xarray related, such as the 'Too many open files' issue he has been working on lately.
Does this sound OK?

xylar added 4 commits February 8, 2017 22:55
Rename config.analysis to config.template and perform some major
conceptual reorganization and cleanup.  Note: corresponding changes
to the code have not been made, so this is simply a concept for
motivating the required code changes.

Change how analyses are turned on/off ('generate')

With this merge, there is a single list 'generate' in the
'[output]' section that determines which analyses are run.  This
option replaces the 'generate' options in in the sections for
each specific analysis module.

The syntax for generate is described in config.template as follows.

'generate' is a list of which analyses to generate.  Valid names are:
   'ohc_timeseries', 'sst_timeseries', 'sst_modelvsobs', 'sss_modelvsobs',
   'mld_modelvsobs', 'seaice_timeseries', 'seaice_modelvsobs'
 the following shortcuts exist:
   'all' -- all analyses will be run
   'all_timeseries' -- all time-series analyses will be run
   'all_modelvsobs' -- all analyses comparing model climatologies with
                       observatoins will be run
   'all_ocean' -- all ocean analyses will be run
   'all_seaice' -- all sea-ice analyses will be run
   'no_ohc_timeseries' -- skip the 'ohc_timeseries' (and similarly with the other analyses).
   'no_ocean', 'no_timeseries', etc. -- in analogy to 'all_*', skip the given category of analysis

With a command line argument `--generate`, a comma-separated list
can be used to select which analyses are performed.  The command-line
argument overrides the config option.

Move input/output paths to config sections

This merge moves the input path to 'basedir' in [input] and
the output path to 'basedir' in [output].  The subpaths in
output are no longer full paths, just subpaths off of 'basedir'.

A few formatting issues were also cleaned up, primarily changing
tabs to spaces.

Switch observation and reference sims. to own sections

Switch the folders for observations and reference simulations in
the config file to their own files (ocean_observations,
ocean_reference, etc.).

Make 'climatology' and 'time_series' config sections

These now contain the bounds in time of their respective analyses.

ref_casename_v0 --> ref_casename

This change has so far only been made in the config file and where
the ref_casename is parsed.  More work should be done in the future
to make sure other reference cases than ACME v0 can be used here.

Update config.template to use CamelCase

Some other minor simplifications and reorganization is also included.

Update run_analysis to be compatible with new config

run_analysis now uses CamelCase throughout.  Docstring have been
added to utility functions.

A new utility function has been added that makes a full path from
a base directory and relative path read from a config file.

Update analysis and plotting utilities based on new config

The ocean.timeseries_ohc and shared.plot.plotting have been updated
to be compatible with the new config template.

Analysis functions have also been updated to use CamelCase and a few
minor formatting and PEP8 issues have been addressed
Add a README in the configs directory
Also mention the `generate` option and command-line flag
@xylar xylar changed the title Reorganize config.analysis --> config.tempalate Reorganize config.tempalate (formerly config.analysis) Feb 8, 2017
@xylar
Copy link
Collaborator Author

xylar commented Feb 8, 2017

@milenaveneziani and @pwolfram, that plan sounds good to me!

@xylar
Copy link
Collaborator Author

xylar commented Feb 8, 2017

I rebased, so this PR is just awaiting testing on Edison once /project is back up. Obviously that won't happen today (for me at least).

@milenaveneziani milenaveneziani merged commit 682cf97 into MPAS-Dev:develop Feb 10, 2017
@milenaveneziani
Copy link
Collaborator

@xylar: I added the correct remap file in the example config file for the GMPAS-240km case on LANL IC machines (hope that was OK with you), and then tested it. Everything worked fine, so I went ahead and merged this.

@xylar xylar deleted the config_reorg branch February 10, 2017 07:01
@xylar
Copy link
Collaborator Author

xylar commented Feb 10, 2017

Yep, that's perfect! Thank you, @milenaveneziani.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants