Skip to content

Add UO SRML data reader #594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Oct 2, 2018
Merged

Add UO SRML data reader #594

merged 10 commits into from
Oct 2, 2018

Conversation

lboeman
Copy link
Contributor

@lboeman lboeman commented Sep 27, 2018

pvlib python pull request guidelines

Thank you for your contribution to pvlib python! You may delete all of these instructions except for the list below.

You may submit a pull request with your code at any stage of completion.

The following items must be addressed before the code can be merged. Please don't hesitate to ask for help if you're unsure of how to accomplish any of the items below:

  • Closes add UO SRML data reader #589
  • I am familiar with the contributing guidelines.
  • Fully tested. Added and/or modified tests to ensure correct behavior for all reasonable inputs. Tests (usually) must pass on the TravisCI and Appveyor testing services.
  • Updates entries to docs/sphinx/source/api.rst for API changes.
  • Adds description and name entries in the appropriate docs/sphinx/source/whatsnew file for all changes.
  • Code quality and style is sufficient. Passes LGTM and SticklerCI checks.
  • New code is fully documented. Includes sphinx/numpydoc compliant docstrings and comments in the code where necessary.
  • Pull request is nearly complete and ready for detailed review.

Brief description of the problem and proposed solution (if not already fully described in the issue linked to above):


# Quality flags are all labeled 0, but occur immediately after their
# associated var so we create a dict mapping them to var_flag for renaming
flag_label_map = {flag: data.columns[data.columns.get_loc(flag)-1]+'_flag'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E226 missing whitespace around arithmetic operator

data = data.rename(columns=flag_label_map)
# For data flagged bad or missing, replace the value with np.NaN
for col in data.columns[::2]:
data[col] = data[col].where(~(data[col+'_flag'] == 99), np.NaN)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E226 missing whitespace around arithmetic operator

except KeyError:
return col
try:
return var_map[col[:3]]+'_'+col[3:]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E226 missing whitespace around arithmetic operator

year=year % 100,
month=month)
url = "http://solardat.uoregon.edu/download/Archive/"
data = read_srml(url+file_name)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E226 missing whitespace around arithmetic operator

@wholmgren wholmgren added this to the 0.6.1 milestone Sep 28, 2018
@wholmgren wholmgren added enhancement solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter io labels Sep 28, 2018
Copy link
Member

@wholmgren wholmgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to include a data file for testing and demo purposes. I'm not sure about the length. Usually I like short files that are easy to visually scan. But it's a 1 minute resolution dataset, so limiting it only a day would still produce a file that is difficult to visually scan.


def read_srml(filename):
"""
Read SRML file into pandas dataframe.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_srml is probably ok for the function name. And SRML might be ok for the first line of the doc string. But then we definitely need to say the full name: University of Oregon Solar Radiation Measurement Laboratory. And provide a link to their website in a References section.

2400 hours, and to avoid time parsing errors on leap years. Data values
on a given line should now be understood to occur during the interval
extending from the time of the line in which they are listed to
the ending time on the next line, rather than the previous line.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

year = tsv_data.columns[1]
data = format_index(tsv_data, year)
# Rename and drop datetime columns
data = data[data.columns[2:]].rename(columns=map_columns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two lines here will make debugging easier when it inevitably breaks.

data = data[data.columns[2:]].rename(columns=map_columns)

# Quality flags are all labeled 0, but occur immediately after their
# associated var so we create a dict mapping them to var_flag for renaming
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an example like: # ghi_1, 0, dni_1, 0 is mapped to ghi_1, ghi_1_flag, dni_1, dni_1_flag

Spectral data (7xxx) uses all four digits to indicate the
variable.
"""
var_map = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok but I am wondering if it would be better to define it at the top of the module. Then it's easier for people to inspect it if they are interested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I must have misunderstood something the other day, I'll replace this at the top of the module.

except KeyError:
return col
try:
return var_map[col[:3]] + '_' + col[3:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend more explicit code like

variable_name = var_map[col[:3]]
variable_number = col[3:]
return variable_name + '_' + variable_number



def request_srml_data(station, year, month, filetype='PO'):
"""Read a month of SRML data from solardat into a Dataframe.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs the same references as the basic read function above.

return df


def request_srml_data(station, year, month, filetype='PO'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe read_srml_from_solardat and import it in __init__.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this module include these kind of util functions? I wasn't sure about including this here, or a function to stitch together due to their function signatures being pretty far off from read_<data_type> functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the module should include any generally useful functions for IO related to the UOSRML. That would include downloading directly from their website.

I suppose this function might more accurately be called read_srml_month_from_solardat

I'm less sure about a function to download and stitch together multiple months because you said that the columns sometimes change. If we made it, that's the function that would be called read_srml_from_solardat

@@ -1,2 +1,4 @@
from pvlib.iotools.tmy import read_tmy2 # noqa: F401
from pvlib.iotools.tmy import read_tmy3 # noqa: F401
from pvlib.iotools.srml import read_srml # noqa: F401
from pvlib.iotools.srml import read_srml_month_from_solardat # noqa: F401
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E261 at least two spaces before inline comment

numbers `here. <http://solardat.uoregon.edu/DataElementNumbers.html>`_
"""
variable_map = {
'100': 'ghi',
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E126 continuation line over-indented for hanging indent

'931': 'temp_dew',
'933': 'relative_humidity',
'937': 'temp_cell',
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E121 continuation line under-indented for hanging indent


@network
def test_read_srml_month_from_solardat():
file_data = srml.read_srml('http://solardat.uoregon.edu/download/Archive/EUPO1801.txt')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (91 > 79 characters)

@lboeman
Copy link
Contributor Author

lboeman commented Oct 1, 2018

Updated in response to some of your comments:

  • I've added references to the University of Oregon Solar Radiation Monitoring Laboratory to read_srml and read_srml_month_from_solardat. If there is a better format for those attributions, I can update them properly.
  • I had originally included a full month of data for testing, but I created a smaller file with only a day of data called SRML-day-EUPO1801.txt. The file only includes three variables to make it easier to scan.
  • Added comments for clarity around flag parsing code.Updated in response to some of your comments:
  • I've added references to the University of Oregon Solar Radiation Monitoring Laboratory to read_srml and read_srml_month_from_solardat. If there is a better format for those attributions, I can update them properly.
  • I had originally included a full month of data for testing, but I created a smaller file with only a day of data called SRML-day-EUPO1801.txt. The file only includes three variables to make it easier to scan.
  • Added comments for clarity around flag parsing code.

Would it be useful if this could handle data other than 1-min resolution?

"""
df_time = df[df.columns[1]] - 1
df_doy = df[df.columns[0]]
fifty_nines = df_time % 100 == 99
Copy link
Member

@cwhanse cwhanse Oct 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be ==59? Or are there lines in the data where the hour of day is something like hh99? A comment here explaining this line and the next would be appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variable names might be more confusing than I had intended. 59 is the desired result here, after reducing the times by 1 all of the hours become HH99(e.g. 2400->2399), so they need to be adjusted.
This was to get around exceptions trying to parse a datetime from day 366 hour 2400 of a leap year. It is admittedly quite confusing. I will document the rationale better here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, makes sense now. Because of -1, the times run from hh00, hh01, ..., hh58, hh99. Minor nit - putting df_doy = ahead of df_time = would help me see the connection.

Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@wholmgren wholmgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're close

the fourth indicating the instrument. Spectral data (7xxx) uses all
four digits to indicate the variable. See a full list of data element
numbers `here. <http://solardat.uoregon.edu/DataElementNumbers.html>`_
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be a comment or part of the module documentation? If a comment, use # to start each line (pep8). If documentation, I think it should be combined with first block quote. We haven't set any standards about documenting module variables, so either approach is ok with me.

four digits to indicate the variable. See a full list of data element
numbers `here. <http://solardat.uoregon.edu/DataElementNumbers.html>`_
"""
variable_map = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VARIABLE_MAP?


Notes
-----
Note that the time index is shifted back one minute to account for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete Note that

-----
Note that the time index is shifted back one minute to account for
2400 hours, and to avoid time parsing errors on leap years. Data values
on a given line should now be understood to occur during the interval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace "Data values ... previous line" with "The returned data values should be understood to occur during the interval from the time of the row until the time of the next row. This is consistent with pandas' default labeling behavior."


References
----------

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blank line needed here.

assert data.index[0] == start
assert data.index[-1] == end
assert (data.index[59::60].minute == 59).all()
assert year not in data.columns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this assert fail? it's type sensitive...

In [3]: df = pd.DataFrame(columns=['2016'])

In [4]: '2016' in df.columns
Out[4]: True

In [5]: 2016 in df.columns
Out[5]: False

In [6]: df = pd.DataFrame(columns=[2016])

In [7]: '2016' in df.columns
Out[7]: False

In [8]: 2016 in df.columns
Out[8]: True

@wholmgren
Copy link
Member

great, thanks @lboeman

@wholmgren wholmgren merged commit bc80cbf into pvlib:master Oct 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement io solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants