Add UO SRML data reader #594

lboeman · 2018-09-27T17:25:23Z

pvlib python pull request guidelines

Thank you for your contribution to pvlib python! You may delete all of these instructions except for the list below.

You may submit a pull request with your code at any stage of completion.

The following items must be addressed before the code can be merged. Please don't hesitate to ask for help if you're unsure of how to accomplish any of the items below:

Closes add UO SRML data reader #589
I am familiar with the contributing guidelines.
Fully tested. Added and/or modified tests to ensure correct behavior for all reasonable inputs. Tests (usually) must pass on the TravisCI and Appveyor testing services.
Updates entries to docs/sphinx/source/api.rst for API changes.
Adds description and name entries in the appropriate docs/sphinx/source/whatsnew file for all changes.
Code quality and style is sufficient. Passes LGTM and SticklerCI checks.
New code is fully documented. Includes sphinx/numpydoc compliant docstrings and comments in the code where necessary.
Pull request is nearly complete and ready for detailed review.

Brief description of the problem and proposed solution (if not already fully described in the issue linked to above):

stickler-ci · 2018-09-27T17:25:31Z

pvlib/iotools/srml.py

+
+    # Quality flags are all labeled 0, but occur immediately after their
+    # associated var so we create a dict mapping them to var_flag for renaming
+    flag_label_map = {flag: data.columns[data.columns.get_loc(flag)-1]+'_flag'


E226 missing whitespace around arithmetic operator

stickler-ci · 2018-09-27T17:25:31Z

pvlib/iotools/srml.py

+    data = data.rename(columns=flag_label_map)
+    # For data flagged bad or missing, replace the value with np.NaN
+    for col in data.columns[::2]:
+        data[col] = data[col].where(~(data[col+'_flag'] == 99), np.NaN)


E226 missing whitespace around arithmetic operator

stickler-ci · 2018-09-27T17:25:31Z

pvlib/iotools/srml.py

+        except KeyError:
+            return col
+    try:
+        return var_map[col[:3]]+'_'+col[3:]


E226 missing whitespace around arithmetic operator

stickler-ci · 2018-09-27T17:25:31Z

pvlib/iotools/srml.py

+        year=year % 100,
+        month=month)
+    url = "http://solardat.uoregon.edu/download/Archive/"
+    data = read_srml(url+file_name)


E226 missing whitespace around arithmetic operator

pvlib/iotools/srml.py

pvlib/test/test_srml.py

wholmgren

I think it's a good idea to include a data file for testing and demo purposes. I'm not sure about the length. Usually I like short files that are easy to visually scan. But it's a 1 minute resolution dataset, so limiting it only a day would still produce a file that is difficult to visually scan.

wholmgren · 2018-09-28T16:47:50Z

pvlib/iotools/srml.py

+
+def read_srml(filename):
+    """
+    Read SRML file into pandas dataframe.


read_srml is probably ok for the function name. And SRML might be ok for the first line of the doc string. But then we definitely need to say the full name: University of Oregon Solar Radiation Measurement Laboratory. And provide a link to their website in a References section.

wholmgren · 2018-09-28T16:52:53Z

pvlib/iotools/srml.py

+    2400 hours, and to avoid time parsing errors on leap years. Data values
+    on a given line should now be understood to occur during the interval
+    extending from the time of the line in which they are listed to
+    the ending time on the next line, rather than the previous line.


link to http://solardat.uoregon.edu/ArchivalFiles.html

wholmgren · 2018-09-28T16:54:14Z

pvlib/iotools/srml.py

+    year = tsv_data.columns[1]
+    data = format_index(tsv_data, year)
+    # Rename and drop datetime columns
+    data = data[data.columns[2:]].rename(columns=map_columns)


two lines here will make debugging easier when it inevitably breaks.

wholmgren · 2018-09-28T16:56:26Z

pvlib/iotools/srml.py

+    data = data[data.columns[2:]].rename(columns=map_columns)
+
+    # Quality flags are all labeled 0, but occur immediately after their
+    # associated var so we create a dict mapping them to var_flag for renaming


Add an example like: # ghi_1, 0, dni_1, 0 is mapped to ghi_1, ghi_1_flag, dni_1, dni_1_flag

pvlib/iotools/srml.py

wholmgren · 2018-09-28T17:02:32Z

pvlib/iotools/srml.py

+    Spectral data (7xxx) uses all four digits to indicate the
+    variable.
+    """
+    var_map = {


this is ok but I am wondering if it would be better to define it at the top of the module. Then it's easier for people to inspect it if they are interested.

I agree, I must have misunderstood something the other day, I'll replace this at the top of the module.

wholmgren · 2018-09-28T17:07:09Z

pvlib/iotools/srml.py

+        except KeyError:
+            return col
+    try:
+        return var_map[col[:3]] + '_' + col[3:]


recommend more explicit code like

variable_name = var_map[col[:3]] variable_number = col[3:] return variable_name + '_' + variable_number

wholmgren · 2018-09-28T17:08:16Z

pvlib/iotools/srml.py

+
+
+def request_srml_data(station, year, month, filetype='PO'):
+    """Read a month of SRML data from solardat into a Dataframe.


This needs the same references as the basic read function above.

wholmgren · 2018-09-28T17:09:27Z

pvlib/iotools/srml.py

+    return df
+
+
+def request_srml_data(station, year, month, filetype='PO'):


maybe read_srml_from_solardat and import it in __init__.py?

Should this module include these kind of util functions? I wasn't sure about including this here, or a function to stitch together due to their function signatures being pretty far off from read_<data_type> functions.

I think the module should include any generally useful functions for IO related to the UOSRML. That would include downloading directly from their website.

I suppose this function might more accurately be called read_srml_month_from_solardat

I'm less sure about a function to download and stitch together multiple months because you said that the columns sometimes change. If we made it, that's the function that would be called read_srml_from_solardat

stickler-ci · 2018-10-01T17:00:56Z

pvlib/iotools/__init__.py

@@ -1,2 +1,4 @@
 from pvlib.iotools.tmy import read_tmy2  # noqa: F401
 from pvlib.iotools.tmy import read_tmy3  # noqa: F401
+from pvlib.iotools.srml import read_srml  # noqa: F401
+from pvlib.iotools.srml import read_srml_month_from_solardat # noqa: F401


E261 at least two spaces before inline comment

stickler-ci · 2018-10-01T17:00:56Z

pvlib/iotools/srml.py

+numbers `here. <http://solardat.uoregon.edu/DataElementNumbers.html>`_
+"""
+variable_map = {
+        '100': 'ghi',


E126 continuation line over-indented for hanging indent

stickler-ci · 2018-10-01T17:00:56Z

pvlib/iotools/srml.py

+        '931': 'temp_dew',
+        '933': 'relative_humidity',
+        '937': 'temp_cell',
+    }


E121 continuation line under-indented for hanging indent

stickler-ci · 2018-10-01T17:00:56Z

pvlib/test/test_srml.py

+
+@network
+def test_read_srml_month_from_solardat():
+    file_data = srml.read_srml('http://solardat.uoregon.edu/download/Archive/EUPO1801.txt')


E501 line too long (91 > 79 characters)

…nment

lboeman · 2018-10-01T18:09:04Z

Updated in response to some of your comments:

I've added references to the University of Oregon Solar Radiation Monitoring Laboratory to read_srml and read_srml_month_from_solardat. If there is a better format for those attributions, I can update them properly.
I had originally included a full month of data for testing, but I created a smaller file with only a day of data called SRML-day-EUPO1801.txt. The file only includes three variables to make it easier to scan.
Added comments for clarity around flag parsing code.Updated in response to some of your comments:
I've added references to the University of Oregon Solar Radiation Monitoring Laboratory to read_srml and read_srml_month_from_solardat. If there is a better format for those attributions, I can update them properly.
I had originally included a full month of data for testing, but I created a smaller file with only a day of data called SRML-day-EUPO1801.txt. The file only includes three variables to make it easier to scan.
Added comments for clarity around flag parsing code.

Would it be useful if this could handle data other than 1-min resolution?

cwhanse · 2018-10-01T19:23:10Z

pvlib/iotools/srml.py

+    """
+    df_time = df[df.columns[1]] - 1
+    df_doy = df[df.columns[0]]
+    fifty_nines = df_time % 100 == 99


Should this be ==59? Or are there lines in the data where the hour of day is something like hh99? A comment here explaining this line and the next would be appropriate.

These variable names might be more confusing than I had intended. 59 is the desired result here, after reducing the times by 1 all of the hours become HH99(e.g. 2400->2399), so they need to be adjusted.
This was to get around exceptions trying to parse a datetime from day 366 hour 2400 of a leap year. It is admittedly quite confusing. I will document the rationale better here.

Aha, makes sense now. Because of -1, the times run from hh00, hh01, ..., hh58, hh99. Minor nit - putting df_doy = ahead of df_time = would help me see the connection.

cwhanse

LGTM

wholmgren

we're close

wholmgren · 2018-10-02T17:41:53Z

pvlib/iotools/srml.py

+the fourth indicating the instrument. Spectral data (7xxx) uses all
+four digits to indicate the variable. See a full list of data element
+numbers `here. <http://solardat.uoregon.edu/DataElementNumbers.html>`_
+"""


Is this supposed to be a comment or part of the module documentation? If a comment, use # to start each line (pep8). If documentation, I think it should be combined with first block quote. We haven't set any standards about documenting module variables, so either approach is ok with me.

wholmgren · 2018-10-02T17:42:04Z

pvlib/iotools/srml.py

+four digits to indicate the variable. See a full list of data element
+numbers `here. <http://solardat.uoregon.edu/DataElementNumbers.html>`_
+"""
+variable_map = {


VARIABLE_MAP?

wholmgren · 2018-10-02T17:43:59Z

pvlib/iotools/srml.py

+
+    Notes
+    -----
+    Note that the time index is shifted back one minute to account for


Delete Note that

wholmgren · 2018-10-02T17:50:38Z

pvlib/iotools/srml.py

+    -----
+    Note that the time index is shifted back one minute to account for
+    2400 hours, and to avoid time parsing errors on leap years. Data values
+    on a given line should now be understood to occur during the interval


Replace "Data values ... previous line" with "The returned data values should be understood to occur during the interval from the time of the row until the time of the next row. This is consistent with pandas' default labeling behavior."

wholmgren · 2018-10-02T17:51:13Z

pvlib/iotools/srml.py

+
+    References
+    ----------
+


No blank line needed here.

pvlib/iotools/srml.py

wholmgren · 2018-10-02T18:06:50Z

pvlib/test/test_srml.py

+    assert data.index[0] == start
+    assert data.index[-1] == end
+    assert (data.index[59::60].minute == 59).all()
+    assert year not in data.columns


can this assert fail? it's type sensitive...

In [3]: df = pd.DataFrame(columns=['2016']) In [4]: '2016' in df.columns Out[4]: True In [5]: 2016 in df.columns Out[5]: False In [6]: df = pd.DataFrame(columns=[2016]) In [7]: '2016' in df.columns Out[7]: False In [8]: 2016 in df.columns Out[8]: True

…rmat_index

wholmgren · 2018-10-02T22:09:15Z

great, thanks @lboeman

read_srml function

9a578e9

stickler-ci bot reviewed Sep 27, 2018

View reviewed changes

pvlib/iotools/srml.py Outdated Show resolved Hide resolved

pvlib/iotools/srml.py Outdated Show resolved Hide resolved

pvlib/test/test_srml.py Outdated Show resolved Hide resolved

testing and style fixes

5726271

lboeman force-pushed the master branch from 828fdb6 to 5726271 Compare September 27, 2018 21:45

wholmgren added this to the 0.6.1 milestone Sep 28, 2018

wholmgren added enhancement solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter io labels Sep 28, 2018

wholmgren reviewed Sep 28, 2018

View reviewed changes

lboeman added 2 commits September 28, 2018 16:09

some code and document clarification changes

fc6b858

use subset of data file for example/testing

ba727ec

stickler-ci bot reviewed Oct 1, 2018

View reviewed changes

lboeman added 3 commits October 1, 2018 10:20

fix filename and style

ea058f0

update tests to use parameters compatible with 2.7-min testing enviro…

5d9fa8c

…nment

remove 0.# suffixes from flag columns in included data file

1a86882

cwhanse reviewed Oct 1, 2018

View reviewed changes

clarify time parsing code

1f27d41

cwhanse approved these changes Oct 2, 2018

View reviewed changes

update what's new

ff15b48

wholmgren reviewed Oct 2, 2018

View reviewed changes

Style updates, capitalize module scope variable, parse year inside fo…

f4b8545

…rmat_index

wholmgren merged commit bc80cbf into pvlib:master Oct 2, 2018



		def request_srml_data(station, year, month, filetype='PO'):
		"""Read a month of SRML data from solardat into a Dataframe.

		return df


		def request_srml_data(station, year, month, filetype='PO'):

Add UO SRML data reader #594

Add UO SRML data reader #594

Uh oh!

Conversation

lboeman commented Sep 27, 2018 • edited by wholmgren Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

pvlib python pull request guidelines

Uh oh!

stickler-ci bot Sep 27, 2018

Choose a reason for hiding this comment

Uh oh!

stickler-ci bot Sep 27, 2018

Choose a reason for hiding this comment

Uh oh!

stickler-ci bot Sep 27, 2018

Choose a reason for hiding this comment

Uh oh!

stickler-ci bot Sep 27, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wholmgren left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stickler-ci bot Oct 1, 2018

Choose a reason for hiding this comment

Uh oh!

stickler-ci bot Oct 1, 2018

Choose a reason for hiding this comment

Uh oh!

stickler-ci bot Oct 1, 2018

Choose a reason for hiding this comment

Uh oh!

stickler-ci bot Oct 1, 2018

Choose a reason for hiding this comment

Uh oh!

lboeman commented Oct 1, 2018

Uh oh!

cwhanse Oct 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cwhanse left a comment

Choose a reason for hiding this comment

Uh oh!

wholmgren left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lboeman commented Sep 27, 2018 •

edited by wholmgren

Loading

cwhanse Oct 1, 2018 •

edited

Loading