Skip to content

Conversation

vanroekel
Copy link
Collaborator

This adds a MLD model vs observations plotting script (against the holte and talley data set). The previous sst_modelvsobs.py has been subsumed into ocean_modelvsobs.py. I also moved the shared interpolation code to its own module in mpas-analysis/shared

I've tried to reduce the code and computations as well. Previously, BIAS comparisons were calculated at every month and seasonally even if only a few were selected. Now only those times selected are computed.

@vanroekel
Copy link
Collaborator Author

Testing:

I ran this on edison and my local laptop. A few images are plotted below. The directories are in the config.analysis within this PR.

mldholtetalleyargo_20160805v0atm a_wcycl1850_v0atm ne30_oec edison alpha8_jas_years0006-0010
mldholtetalleyargo_20160805v0atm a_wcycl1850_v0atm ne30_oec edison alpha8_jfm_years0006-0010

I was not sure if I should commit changes to the config.analysis. Since I added options, I thought it was important.

archive_dir_ocn = /global/cscratch1/sd/golaz/ACME_simulations/20160805.A_WCYCL1850.ne30_oEC.edison.alpha7_00/run/
scratch_dir = /global/project/projectdirs/acme/lvroekel/20160805v0atm.A_WCYCL1850_v0atm.ne30_oEC.edison.alpha7_00.test.pp
plots_dir = /global/project/projectdirs/acme/lvroekel/coupled_diagnostics_20160805v0atm.A_WCYCL1850_v0atm.ne30_oEC.edison.alpha7_00-20160520.A_WCYCL1850.ne30_oEC.edison.alpha6_01
log_dir = /global/project/projectdirs/acme/lvroekel/coupled_diagnostics_20160805v0atm.A_WCYCL1850_v0atm.ne30_oEC.edison.alpha7_00-20160520.A_WCYCL1850.ne30_oEC.edison.alpha6_01.logs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to change the defaults here, although it probably doesn't fully matter as long as particular entries aren't removed. However, it would be easy in the future to accidentally remove entires. @xylar should comment on this too.

@@ -42,10 +43,10 @@ climo_yr2 = 10
yr_offset = 1849

[ohc_timeseries]
generate = 1
generate = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea for changing logical flags-- I don't think these should be edited unless absolutely necessary.

from ..shared.interpolation.interpolate import interp_fields, init_tree

def mld_modelvsobs(config):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing doc string describing function and its purpose.

infiles = "".join([indir,"/am.mpas-o.timeSeriesStats.????-*.nc"])

monthdictionary={'Jan':1,'Feb':2,'Mar':3,'Apr':4,'May':5,'Jun':6,'Jul':7,'Aug':8,'Sep':9,'Oct':10,
'Nov':11,'Dec':12,'JFM':np.array([1,2,3]),'AMJ':np.array([4,5,6]),'JAS':np.array([7,8,9]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First line non-white space characters should be aliged with { above.


# Load in model data
ds = xr.open_mfdataset(infiles, preprocess=lambda x: preprocess_mpas(x, yearoffset=yr_offset,
timeSeriesStats=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue with indentation as above (timeSeriesStats should be left justified with inflies

lonCell = f.variables["lonCell"][:]
latCell = f.variables["latCell"][:]

# Load observation MLD data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment indentation off from code indentation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other places like this in this document too that have this issue.


dinmonth = np.array([31,28,31,30,31,30,31,31,30,31,30,31])

daysarray = np.ones((12,dsData.mld_dt_mean.values.shape[1],dsData.mld_dt_mean.values.shape[0]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces are needed after commas. This is an issue that is throughout PR.


else:
modeldata = np.sum(dinmonth[monthsvalue-1]*monthly_clim.sel(month=monthsvalue).time_avg_dThreshMLD.values.T,axis=1) / np.sum(dinmonth[monthsvalue-1])
obsdata = np.nansum(daysarray[monthsvalue-1,:,:]*dsData.sel(iMONTH=monthsvalue).mld_dt_mean.values.T,axis=0) / np.nansum(daysarray[monthsvalue-1,:,:],axis=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need a line break on these really long lines

diffTitle = "Model-Observations",
cbarlabel = r"$^o$C")


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide a main type function to help modularize this and allow it to be run in script mode with command line arguments? This helps generalize the code and increase its utility, e.g., something like

if __name__ == "__main__":
    # process command line arguments and feed to main below
    main()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram could you comment on a use case that wouldn't be covered by disabling other diagnostics except these in the run_analysis? I could do this, but I can't see a pressing use.

@@ -0,0 +1,47 @@
import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments about docstring above. It would also be good to include some testing functions in this module too so that we can verify it is working properly. This can be as simple as functions with name test_* that are called in a main file when this is run as a script. Assert statements can be used to ensure desired behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a docstring on this module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably also need tests although we can wait on that if necessary for now.

@@ -86,8 +86,14 @@ def analysis(config): #{{{
if config.getboolean('sst_modelvsobs','generate'):
print ""
print "Plotting 2-d maps of SST climatologies..."
from mpas_analysis.ocean.sst_modelvsobs import sst_modelvsobs
from mpas_analysis.ocean.ocean_modelvsobs import sst_modelvsobs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now-- you are generalizing the above function. Very good.

@pwolfram
Copy link
Contributor

@xylar, my understanding is that new analysis should have registered additions to config.analysis, right? However, I'm not sure we want to have changes to default values unless absolutely necessary. What we do need are addition of new variables, right?

I'm guessing a good workflow here would be to include changes to config.analysis that are required for testing in a gist. But I'm not 100% sure if that is the best way to proceed.

@pwolfram
Copy link
Contributor

@vanroekel, I've added some comments to the code. Feel free to not take the style comments too seriously for now (e.g., white space issues, etc.). I think we do want more doc strings to document the code than currently exist, however. I think we will want to squash some commits prior to merging this too.

@pwolfram pwolfram self-assigned this Oct 21, 2016

infiles = "".join([indir,"/am.mpas-o.timeSeriesStats.????-*.nc"])

monthdictionary={'Jan':1,'Feb':2,'Mar':3,'Apr':4,'May':5,'Jun':6,'Jul':7,'Aug':8,'Sep':9,'Oct':10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things like this should probably be in a shared constants location.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, shared/constants.py or some such would work

latDATA = latDATA.flatten()
lonDATA = lonDATA.flatten()

dinmonth = np.array([31,28,31,30,31,30,31,31,30,31,30,31])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be in constants file

@vanroekel
Copy link
Collaborator Author

Okay @pwolfram, I've updated everything except the main function issue. I may have missed a few other minor formatting things perhaps.

@pwolfram
Copy link
Contributor

@vanroekel, can you please add some tests to mpas_analysis/shared/interpolation/interpolate.py if possible? Can you also please let me know where you are testing on edison and I'll double check everything is working?

@vanroekel
Copy link
Collaborator Author

@pwolfram the config.analysis script I used on edison is at
'''
/global/homes/l/lvroekel/MPAS-Analysis/config.analysis
'''

I will try add tests to interpolate.py soon. I still need to read through your example and the xarray example.

@pwolfram
Copy link
Contributor

@vanroekel, I think that you just need functions of the form test_* in the interpolate.py file. Pytest should then run this function and make sure it exits cleanly, e.g., you need assert statements in the function. If you write the tests (even in pseudo-code) I'll help you get them edited so that pytest can use them.

@pwolfram
Copy link
Contributor

FYI, I will postpone testing for now until we more or less have the code we want to commit to ensure that the full testing is as close to the final version as possible.

@pwolfram
Copy link
Contributor

@vanroekel, one question / idea- would it be possible to further generalize the *modelvobs* functions with respect to sea ice too? I took a quick look at this today and it looks like there may be even more opportunities for simplification to reduce the total amount of code.

cc @milenaveneziani

@vanroekel
Copy link
Collaborator Author

@pwolfram and @milenaveneziani, looking at the sea-ice code, I think it certainly may be possible to generalize all of these into one. It fits with how I was eventually envisioning the SST and MLD comparisons as a plot2d_fields_comparison, where no data field is assumed or if it is model to obs. I think the simplification would take some thought as there seem to be enough differences in the sea-ice and ocean plotting scripts.

@milenaveneziani
Copy link
Collaborator

@vanroekel, @pwolfram: first of all, thank you for working on this. It's great to have a general routine to do global 2d climatological maps.

There are a few reasons why I think we are not ready to generalize the sea-ice script as well:

  1. We need a shared script that does climatologies. It should be flexible enough that it does various seasons (for sea-ice, for example, we need JFM, but also DJF and FM).
  2. The interpolation routine should allow for a choice of interpolation techniques. The sea-ice script for example uses ncremap, and the SST script uses the python interpolation (not sure if this changed in the mean time). We should allow for both methods (and possibly more options), and then we should stick with one for all of our diagnostics.
  3. We need to decide what to do in terms of observational data processing. When we talked about this, I think we decided that the data should be pre-processed (pre-compute climatologies, and re-grid). I can't remember if this is an issue in this case: @vanroekel, is the SST/MLD data set pre-processed here?
  4. With sea-ice, we are plotting 2 polar hemispheres on a steoreographic projection. With the global modelvsobs we are plotting on a global projection and need less input variables.
    So, it seems to me that eventually we may be able to unify the ocean and sea-ice scripts, but perhaps not in the immediate future (in the sense that we need to put out other fires first).
    Do you agree?

@vanroekel
Copy link
Collaborator Author

@milenaveneziani these are some of the exact pieces I was thinking about. The SST/MLD data is not preprocessed. For merging with sea-ice this would have to change. Not a hard change though.

I would tend to agree that merging sea-ice and ocean model vs obs is a longer term goal, but should be a goal.

@pwolfram
Copy link
Contributor

pwolfram commented Oct 24, 2016

@vanroekel and @milenaveneziani, should we just merge the existing code once we are all happy with this PR? I'm ok waiting to do a more generalized aggregation of the scripts and at this point agree that it is wise to wait for the generalization, which we can keep track of in #31.

@vanroekel
Copy link
Collaborator Author

@pwolfram I would be in favor of merging this PR prior to generalization. I'm working on the unit testing now. So I'd like to wait for these additions.

@milenaveneziani
Copy link
Collaborator

@pwolfram: I would also merge this prior to generalization.
@vanroekel: I think you mentioned that you also tested this within the ACME analysis framework, right?

@pwolfram
Copy link
Contributor

@vanroekel, thanks for moving the testing file. pytest checks out now.

@pwolfram
Copy link
Contributor

Also, this points to the need for CI regression testing as we have discussed earlier now that we are getting some regression tests.

@vanroekel
Copy link
Collaborator Author

@pwolfram I gave you the wrong path in the config.analysis file, see the new version in same location. The path to files should be

'''
/global/cscratch1/sd/golaz/ACME_simulations/20160805.A_WCYCL1850.ne30_oEC.edison.alpha7_00/run/
'''

@milenaveneziani
Copy link
Collaborator

@vanroekel: are you asking to move the MLD observation data to edison? Sorry if I haven't done this already, but you only have the data on IC, right? I think I also have it at OLCF, so I can look into that.
@pwolfram: that directory of ACME model output files no longer exists at NERSC. We have used a different filepath recently (in run_EDISON_mv or also in Phillip's version, I believe).

Having said that, as I mentioned to Luke already, we should not test any of this within the ACME repo just yet (especially now that we are dealing with a troublesome merge to master). I think the workflow in the future should be similar to that of MPAS and ACME main codes: we make a bunch of changes in MPAS-Analysis, test them thoroughly, update the ACME analysis repo locally and do more testing, then merge to the ACME analysis master. Or at least this seems like the safer way to me.

@vanroekel
Copy link
Collaborator Author

@milenaveneziani, the data is on edison at /global/homes/l/lvroekel/MLD

@pwolfram, there is an issue in the code right now in testing on edison. I'm debugging.

@vanroekel
Copy link
Collaborator Author

@pwolfram, confirmed the code just pushed works on edison.

@xylar
Copy link
Collaborator

xylar commented Oct 26, 2016

@milenaveneziani wrote:

I think the workflow in the future should be similar to that of MPAS and ACME main codes: we make a bunch of changes in MPAS-Analysis, test them thoroughly, update the ACME analysis repo locally and do more testing, then merge to the ACME analysis master. Or at least this seems like the safer way to me.

I agree that this approach is needed. One way to handle this without disrupting our workflow would be to introduce a develop branch, and to treat master as a release branch (or rename it). Matt Hoffman might have suggestions for how to make this happen (but he's not on this repo yet).

@xylar
Copy link
Collaborator

xylar commented Oct 26, 2016

I went ahead and created a develop branch. I figured it can't do any harm. I'm going to change this PR to merge into develop for now, Let me know if that worries any of you.

@xylar xylar changed the base branch from master to develop October 26, 2016 08:25
@pwolfram
Copy link
Contributor

pwolfram commented Oct 27, 2016

@xylar, in the future we all should socialize major changes to the repository with the group, e.g., the introduction of the develop branch. This is our best practices and although this isn't a big issue here there could be good reasons why we would not want to make unilateral changes like this without getting further buy-in from the whole group. It is possible this already happened and there was consensus at a meeting I missed this week because of traveling. If so, my apologies about needlessly raising this issue. Either way, this is not a big deal, but I think we will get a better product if we do our best to socialize larger changes. Please let me know if I'm not following this practice too.

Just realized the above line by @xylar was the buy-in step. However, perhaps we should formalize these things as github issues for discussion in the future to make things a little more transparent.

Note, for the record, @xylar's idea of develop is a really good idea because it provides us an additional safety net that we can use to resolve serious issues if something goes wrong prior to merging to master.

@pwolfram
Copy link
Contributor

I'll be merging to develop for this PR following testing. @vanroekel, permissions are broken again for the config file at/global/homes/l/lvroekel/config.analysis and I'm guessing your cached data has this same issue and needs reset permissions. Essentially I just want to make sure I'm using the right configuration file on edison to test this prior to this merge.

@vanroekel
Copy link
Collaborator Author

@pwolfram sorry for the permissions issue. Try now.

@pwolfram
Copy link
Contributor

@vanroekel, does this error make sense to you?

┌─[pwolfram][edison03][~/test_MA_vanroekel][21:46][±][develop ✓]
└─▪ python run_analysis.py config.analysis.test 

Plotting 2-d maps of SST climatologies...
Traceback (most recent call last):
  File "run_analysis.py", line 133, in <module>
    analysis(config)
  File "run_analysis.py", line 90, in analysis
    ocn_modelvsobs(config, 'sst')
  File "/global/u2/p/pwolfram/test_MA_vanroekel/mpas_analysis/ocean/ocean_modelvsobs.py", line 153, in ocn_modelvsobs
    if np.issubdtype(monthsvalue, int):
  File "/global/homes/p/pwolfram/miniconda2/lib/python2.7/site-packages/numpy/core/numerictypes.py", line 761, in issubdtype
    return issubclass(dtype(arg1).type, val)
TypeError: data type not understood

@vanroekel
Copy link
Collaborator Author

vanroekel commented Oct 27, 2016

@pwolfram, yes, I had this and thought I fixed it. This line (153)

np.issubdtype(monthsvalue, int):

should be

isinstance(monthsvalue, (int,long))

I had fixed this and thought I pushed changes, but obviously not. I can clean this up tomorrow morning

@pwolfram
Copy link
Contributor

@vanroekel, thanks- sounds good.

@xylar
Copy link
Collaborator

xylar commented Oct 27, 2016

@pwolfram wrote:

However, perhaps we should formalize these things as github issues for discussion in the future to make things a little more transparent.

Yes, I think you are right. I felt a little bit conflicted about unilaterally adding a develop branch. At the same time, I felt like there was some urgency since a solution was needed to prevent conflicts with ACME and to allow testing of a number of changes before pulling them into ACME. I was afraid the damage would be done before the discussion could take place. I'll create an issue page now.

@xylar xylar mentioned this pull request Oct 27, 2016
This is the first attempt at unit testing in the interpolate.py module.
It tests all functionality within that bit of code using pytest

Adds an arrayAssertEqual, greaterThan, and lessThan

Fixes trailing white space and int to floats

Justify issue fixed

Updates testing

Fixes a bug

Small fix
@vanroekel
Copy link
Collaborator Author

@pwolfram the small fix described above has been pushed. Should be ready for you to test once cscratch on edison is back.

@xylar
Copy link
Collaborator

xylar commented Oct 27, 2016

CSCRATCH will be down for a couple of days. Do we have any alternative for testing in the meantime?

@vanroekel
Copy link
Collaborator Author

The files I used for testing are also on hpss at nersc. I will try download them to /scratch2/scratchdirs/lvroekel for @pwolfram to finish testing this PR

@xylar
Copy link
Collaborator

xylar commented Oct 27, 2016

@vanroekel, that's great. Can you give me permission to see your scratch folder? That way I can use the files to test #38 as well (once I rebase onto this branch).

@vanroekel
Copy link
Collaborator Author

@xylar , I am still downloading data, as soon as it is all there (Should be done shortly), I'll alter permissions. Do you need more than the sea ice and ocean timeSeriesStats files?

@vanroekel
Copy link
Collaborator Author

@xylar and @milenaveneziani I downloaded timeseriesstats ocean and sea-ice files to scratch2 for testing. Permissions should be okay for all to see, if not let me know. If you need additional files, please let me know as well.

@pwolfram pwolfram merged commit 1325ca9 into MPAS-Dev:develop Oct 28, 2016
@pwolfram
Copy link
Contributor

pwolfram commented Oct 28, 2016

┌─[pwolfram][edison08][~/test_MA_vanroekel][19:58][±][ ✓]
└─▪ python run_analysis.py config.analysis.test 

Plotting 2-d maps of SST climatologies...

Plotting 2-d maps of MLD climatologies...

and output is visually indistinguishable from posted photos above in PR description.

Thanks @vanroekel, @xylar, and @milenaveneziani for sticking together so well through these HPC challenges. Great teamwork everyone!

@pwolfram pwolfram mentioned this pull request Dec 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants