Skip to content

Conversation

pwolfram
Copy link
Contributor

@pwolfram pwolfram commented Oct 6, 2016

This is a prototype reader / writer for interfacing with namelist and streams and this will ultimately be needed to support generalization identified in #20.

A list of preliminary features (included or could be included):

  • Read-only status for the classes which build the reader/writer namelist and streams objects
  • Pure-python implementation that does not require use of command line tools, e.g., awk or sed and calls to the shell.
  • Type conversion, especially for things like numbers, times, and logic

@pwolfram
Copy link
Contributor Author

pwolfram commented Oct 6, 2016

@xylar and @milenaveneziani, this is the start of a set of classes which we can used to read / write namelist and streams files. At this point it is probably "prototype" quality code. Please let me know what you think. I'm thinking we should use this as a "straw-man" to build out general capability to manipulate namelist and streams files.

I'm putting this out here to stimulate discussion and as a starting point for the changes we need to generalize the code and fully expect many or all of the lines in this file to be rewritten or adapted to our needs.

@pwolfram
Copy link
Contributor Author

pwolfram commented Oct 6, 2016

@xylar and @milenaveneziani, the thing we need to focus on here is the API for the classes that interface with the namelist and streams files, e.g.,

# get check if global stats is on
nl = Namelist(nlistpath)
dt = nl.read('config_AM_globalStats_enable')

# get name for mesh file
sf = XMLList(streamspath)
meshname = sf('mesh', 'filename_template')

Once we have a good handle on this we should be able to write the necessary functionality that we need.

@milenaveneziani
Copy link
Collaborator

@pwolfram: this sounds good to me. How do you suggest we should test it? With an example script, or by modifying one of the scripts that we already have to do plotting/analysis?

@pwolfram
Copy link
Contributor Author

pwolfram commented Oct 6, 2016

@milenaveneziani, this brings up the large question of having unit tests. We could use pytest for that and essentially ensure that different parts of the code are doing precisely what they need to do to meet our requirements. There would be a new test folder that contains these unit tests to ensure that the code is working properly.

@pwolfram
Copy link
Contributor Author

@milenaveneziani, I've pushed some changes that include unit tests via the pytest framework (similar to what xarray uses). Basically when you are in the folder you can type pytest to run the unit tests. I believe you'll need to conda install pytest to use this testing framework. Essentially we can use this to build out the key unit tests we need in the model.

@pwolfram
Copy link
Contributor Author

Note @xylar and @milenaveneziani, this is still somewhat rough but one end goal here is to get automatic testing for each PR via pytest to ensure that we don't accidental break functionality that is important as we modify the repo. We will likely want to modify the interfaces uses for the namelist and streams reader / writer.

@@ -0,0 +1,91 @@
#!/usr/bin/python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be /usr/bin/env python

10/07/2016
"""

import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

"""

import os
import pytest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

self.setup_namelist(readonly=True)
with self.assertRaisesRegexp(AssertionError, 'Cannot write to namelist file .* because readonly=True'):
self.nl.write('config_dt', '00:00:00')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@requires_lxml needed?

class XMLList:
"""
Class to read in streams configuration file, provdies
read and write functionality
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why we would want write functionality as part of this repo. Can you suggest a case where writing or modifying a streams file might make sense?

#print command
result = call(command,shell=True)

class XMLList:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the class name is too general for the specific usage (or perhaps the description of the class is too specific)

self.write(self.fname+'.backup')

def read(self, streamname, attribname):
""" name is a list of name entries terminanting in some value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix the docstring? Makes no sense so me currently.

self.readonly = readonly
self.xmlfile = etree.parse(fname)
self.root = self.xmlfile.getroot()
if backup:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do decide to keep write/modify functionality, I think backup=True only makes sense if readonly=False.

else:
print "%s was not changed to %s because it didn't exist and we aren't setting new fields!"%(attribname, value)

def write(self, fname=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to me that write doesn't check if readonly==True, especially if fname=None or fname==self.fname.


def read(self, name):
# shell return value
return_val = check_output(['awk', '/'+name+'/{printf $3}', self.fname])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned in the PR description, it would be preferable to have pure python. I would think it would make more sense to read in the full file on init and create a dictionary from the names and values. Then, the various get functions (get, getint, getfloat, getbool) that I suggest we use instead of read would would just return the dictionary value, possibly with the appropriate casting.

return_val = return_val.strip('"').strip("'")
return return_val

def write(self, name, value):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want namelist write functionality in this repo. @pwolfram, can you give me an example of where we might need this in this repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, we may want to use this type of code to make edits for namelists / streams for automatic test cases. This was the context that this code was original written to address. I think there is an advantage to having this functionality, even if we don't plan to immediately use it because we want the O part of IO too in order to make this a general tool. For example, I can easily envision using MPAS-Analysis to setup/analyze test cases in the testing core and this would be useful in this endeavor.

@xylar
Copy link
Collaborator

xylar commented Oct 15, 2016

@pwolfram wrote:

the thing we need to focus on here is the API for the classes that interface with the namelist and streams files

Can we change the API to be more like ConfigParser, using get for strings and getint, getfloat and getbool for those respective types? Adapting your example from above:

# get check if global stats is on
nl = Namelist(nlistpath)
dt = nl.getfloat('config_dt')
timeInteg = nl.get('config_time_integrator')
numHalos = nl.getint('config_num_halos')
explicitProcDecomp = nl.getbool('config_explicit_proc_decomp')

# get name for mesh file
sf = XMLList(streamspath)
meshname = sf.read('mesh', 'filename_template')

@pwolfram
Copy link
Contributor Author

@xylar, would we want to have a dictionary-like capability as well as the explicit calling functions? I don't see why not but it may be more elegant (and risky, however) to try to do automatic type-casting with output / access via a dictionary-like structure.

@xylar
Copy link
Collaborator

xylar commented Oct 21, 2016

@pwolfram, I assume you're still working on updating this PR. Let me know if you're waiting on anything from me.

@pwolfram
Copy link
Contributor Author

@xylar, you are correct-- I have not done anything on this since we chatted Monday. If I'm holding someone up I can increase priority on this and finish it ASAP.

@milenaveneziani
Copy link
Collaborator

@pwolfram, @xylar: to put it into perspective, this PR, together with a future one on mpas_xarray, have higher priority with respect to anything else, because anything that went into alpha8 and that will go in alpha9 breaks the scripts. In alpha8, we have changed filenames. In alpha9, we will be changing timeSeriesStats instances, and the variable names will change as a consequence (this of course involves changes in this PR and in mpas_xarray/other python scripts). I think it would be good if we could solve these issues in the next couple of weeks, if possible.
Do you think it is feasible?

@pwolfram
Copy link
Contributor Author

@xylar, I've updated the code to reflect our conversation earlier this week. Please let me know what you think. We should have read functionality for namelists and streamfiles now and have testing via pytests, which gets us one step away from CI testing for all PRs in the future.

@pwolfram
Copy link
Contributor Author

P.S. obviously commits need squashed but this can be done after you take a look and before the merge. cc @milenaveneziani

'0100_00:00:00')


# NOTE, MAY NEED TO SANITIZE NAMELIST AND STREAMS FILES A LITTLE BIT FOR
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram, I think the example namelists and streams are okay. You can simplify them if you want. But I'd remove this comment either way.

@xylar
Copy link
Collaborator

xylar commented Oct 23, 2016

@pwolfram, this looks good. I confirm that the tests seem to cover our bases and that they pass with the example namelist and streams file. If you would squash the commits and remove the note i mentioned above (after simplifying the streams file if you like), I will merge.

I don't think there is a particular need for better type checking in this PR. It seems sufficient to me if type errors are raised when the various get* methods of NameList are used incorrectly. If you feel that better type checking is urgently needed, please make these modifications.

@milenaveneziani
Copy link
Collaborator

@pwolfram, @xylar: thanks a bunch for working on this!
I am eager to try this out on ACME output.

@pwolfram pwolfram force-pushed the namelist_streams_interface branch from 05546c5 to d761229 Compare October 24, 2016 20:58
@pwolfram pwolfram force-pushed the namelist_streams_interface branch from d761229 to b06dd4f Compare October 24, 2016 21:00
@pwolfram
Copy link
Contributor Author

@xylar, I think this should be ready to merge following your quick double-check on the changes.

@xylar
Copy link
Collaborator

xylar commented Oct 24, 2016

Great, I'll take a look as soon as I can.

@pwolfram
Copy link
Contributor Author

Thanks @xylar!

@xylar
Copy link
Collaborator

xylar commented Oct 25, 2016

@pwolfram, I am going to merge this soon. In the future, could you make the description of the PR something that is appropriate as a commit message for the merge? This means it should not include references to other PRs by number and should describe what is in the PR, as opposed to what might be added to the PR. I have modified the commit message to remove/clean up these issues.

@xylar xylar merged commit b06dd4f into MPAS-Dev:master Oct 25, 2016
@xylar
Copy link
Collaborator

xylar commented Oct 25, 2016

@pwolfram, I made sure the merged branch passed the tests. The new code doesn't touch the existing analysis in any way so I didn't bother to test that the analysis itself still runs correctly.

@xylar
Copy link
Collaborator

xylar commented Oct 25, 2016

@pwolfram, please delete the remote branch, since I don't have permission.

@pwolfram pwolfram deleted the namelist_streams_interface branch October 25, 2016 11:40
@pwolfram
Copy link
Contributor Author

@xylar, thanks for the feedback on the PR description. I'll put checklists with an introduction of "Features of this merge include" and reference other issues in a comment outside the PR description.

@pwolfram pwolfram mentioned this pull request Oct 25, 2016
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants