Skip to content

Add numpy SPECTRL2 implementation #1062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Jan 4, 2021
Merged

Conversation

kandersolar
Copy link
Member

@kandersolar kandersolar commented Sep 20, 2020

  • Closes #xxxx
  • I am familiar with the contributing guidelines
  • Tests added
  • Updates entries to docs/sphinx/source/api.rst for API changes.
  • Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
  • Pull request is nearly complete and ready for detailed review.
  • Maintainer: Appropriate GitHub Labels and Milestone are assigned to the Pull Request and linked Issue.

Additional to-do:

  • Get it to broadcast across vector inputs
  • remove debugging code

Calculating the spectral distribution for a single timestamp takes under a millisecond on my laptop. I've been comparing against the solar_utils wrapper around the NREL C implementation (thanks @mikofski for making it so easy to use!) and have discovered several discrepancies between the equations in the report, the report code appendix, and the C implementation. It doesn't help that the C implementation uses its own (non-SPA) solar position calculator. By matching the python code to the C implementation for each of the differences, the difference in output spectra is small enough for me to start blaming the accumulated error from the C implementation's single-precision floats:

Discrepancies:

Equation Report Appendix spectrl2_2.c
2-4 1.335 1.335 1.3366
2-11 118.93 118.93 118.3
3-8 To' Tu' Tu'
3-5, 3-6, 3-7, 3-1 double-count Cs single-count Cs single-count Cs
2-5 kasten1966 kasten1966 kastenyoung1989

image

@kandersolar
Copy link
Member Author

I think this is ready for review whenever folks get a chance. For the sake of ease of testing I've aligned all the inconsistencies (see table in original comment) with spectrl2_2.c but I'm open to alternatives. See https://gist.github.com/kanderso-nrel/0dc7e7ebff7b5a9c13b52f8b48eb71ee for the validation scratchpad I've been using.

I created a new spectrum module for this. Would atmosphere be a better home? Could consider moving the other spectral functions (pvlib.atmosphere.first_solar_spectral_correction, pvlib.pvsystem.sapm_spectral_loss) here if not.

Thanks!

@kandersolar kandersolar marked this pull request as ready for review September 20, 2020 21:42
@cwhanse
Copy link
Member

cwhanse commented Sep 21, 2020

I created a new spectrum module for this. Would atmosphere be a better home? Could consider moving the other spectral functions (pvlib.atmosphere.first_solar_spectral_correction, pvlib.pvsystem.sapm_spectral_loss) here if not.

I'm in favor the new spectrum.py and, related to #436, eventually moving functions that calculate spectral modifiers from pvsystem.py and atmosphere.py into spectrum.py.

@cwhanse cwhanse added this to the 0.8.1 milestone Sep 21, 2020
@adriesse
Copy link
Member

Nice work.

@cwhanse cwhanse assigned cwhanse and unassigned cwhanse Dec 10, 2020
@cwhanse cwhanse self-requested a review December 10, 2020 15:53
Copy link
Member

@wholmgren wholmgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I didn't compare the implementation to the reference.


Parameters
----------
surface_tilt : float or numpy array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with numpy array but worth considering numpy.ndarray for precision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to say something about the dimensions of the array arguments?

Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll volunteer to check this against the reference.

Is there a reason to exclude pandas.Series from the input types?


Parameters
----------
surface_tilt : float or numpy array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to say something about the dimensions of the array arguments?

# Is = Ir + Ia + Ig # Eq 3-1
Is = (Ir + Ia + Ig) * Cs # Eq 3-1

# calculate spectral irradiance on a tilted surface, Eq 3-18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could transposition be a separate function? Or could the use be expected to do it using the existing transposition functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a separate function, but I'm inclined to implement the complete model.

If users want to do their own transposition, is there any issue with them passing surface_tilt=0 and transposing afterwards (other than the inefficiency of doing an unnecessary transposition in this function)?

Copy link
Member Author

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to exclude pandas.Series from the input types?

Probably, but I'll have to remind myself what it is. I'd guess to make broadcasting easier. Whatever the reason, re-wrapping in a Series before returning would probably not be an issue.

# Is = Ir + Ia + Ig # Eq 3-1
Is = (Ir + Ia + Ig) * Cs # Eq 3-1

# calculate spectral irradiance on a tilted surface, Eq 3-18
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a separate function, but I'm inclined to implement the complete model.

If users want to do their own transposition, is there any issue with them passing surface_tilt=0 and transposing afterwards (other than the inefficiency of doing an unnecessary transposition in this function)?

@cwhanse
Copy link
Member

cwhanse commented Dec 10, 2020

Is there a reason to exclude pandas.Series from the input types?

Probably, but I'll have to remind myself what it is. I'd guess to make broadcasting easier. Whatever the reason, re-wrapping in a Series before returning would probably not be an issue.

I'm thinking of users who would bring e.g. surface_tilt or apparent_zenith into this function as a Series.

@kandersolar
Copy link
Member Author

kandersolar commented Dec 16, 2020

Inputs now allowed to be pd.Series. I'm having trouble articulating why, but I'm less enthusiastic about returning pandas objects -- housing the spectra in DataFrames feels like it's misusing DataFrames in some way. Maybe it's because the column names would be floats (one per wavelength). To me it seems cleaner to always return arrays, but I wouldn't mind being overruled.

To-do items:

  • fix docs build warning: docstring of pvlib.spectrum.spectrl2:49: WARNING: Unexpected indentation.
  • fix docstring typo "sigle"

@adriesse
Copy link
Member

This module is already quite big isn't it? What about creating pvlib/spectrum/spectral2.py? or pvlib/spectral/spectral2.py? Then if anyone ever translates smarts they'll know where to put it. :)

@adriesse
Copy link
Member

Regarding the conversion from Series to arrays, I find that numpy usually plays pretty well with Series, so I would be interested to learn where it causes problems for you. Maybe you don't really need to convert them all?

several ways from the original report [1]_. The report itself also has
a few differences between the in-text equations and the code appendix.
The list of known differences is shown below. Note that this
implementation follows ``spectrl2_2.c``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you happen to compare with the Excel version from NREL as well?
If there are two distinct variations, we may give the user the choice as we have done for other models.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't compare with the Excel version in any meaningful way. I agree it would be nice to be able to switch between the two (if there is indeed a difference), but I hesitate to add additional complexity to this PR. Future work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe someone at NREL would know if the C code or the Excel version is in some sense authoritative. I agree with deferring any kind of switch.

@cwhanse
Copy link
Member

cwhanse commented Dec 17, 2020

This module is already quite big isn't it? What about creating pvlib/spectrum/spectral2.py? or pvlib/spectral/spectral2.py? Then if anyone ever translates smarts they'll know where to put it. :)

I support this, even though a pvlib SMARTS won't appear any time soon.

@kandersolar
Copy link
Member Author

Thanks @adriesse for the numpy lesson :) The numpy vs pandas issue is that I want to cleanly produce the outer product of two vectors where one axis indexes time and the other indexes wavelength, for example:

N = 2
airmass = pd.Series(range(N), index=pd.date_range('2019-01-01', freq='h', periods=N))
vapor_coeff = np.array([1, 2, 3])[:, np.newaxis]
airmass * vapor_coeff  # fails, but airmass.values * vapor_coeff works and returns an array with shape (3, 2)

Is there a nice way to get this behavior when one is a Series?

@adriesse
Copy link
Member

I guess I haven't run into this exact situation often. When one argument is a Series, the pandas.__mul__ function takes over, and it doesn't play along. (Might even warrant raising an issue.)

A clean way to get the desired behavior would be to use np.outer. This would also make the intent entirely obvious, which is not the case when you rely on broadcasting with *. It would require you to review each multiplication to see which one needs it, which is obviously a bit of work. If you need a lot of them, it might indeed encumber the code and be considered "not nice". On the other hand making the code intent obvious is "nice" in another way.

I won't hold anything up though. I just provide my thoughts on what is overall a very good contribution to pvlib!

('water_vapor_absorption', 'float64'),
('ozone_absorption', 'float64'),
('mixed_absorption', 'float64'),
]))
Copy link
Member

@adriesse adriesse Dec 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be little benefit to putting the constants into a structured array. How about 5 constant column vectors like this?

WAVELENGTH = np.transpose(np.array([[
                   300.0, 305..0 etc.
]]))

A side benefit is that one can recognize the use of the constants in the code more easily because they are CAPS.

Of course if you do end up using np.outer the constants can remain row vectors.

Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @kanderso-nrel easily traceable to the reference and well-organized.

wavelength = _SPECTRL2_COEFFS['wavelength'][:, np.newaxis]
spectrum_et = _SPECTRL2_COEFFS['spectral_irradiance_et'][:, np.newaxis]

optical_thickness = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pvlib.atmosphere.angstrom_aod_at_lambda, see

def angstrom_aod_at_lambda(aod0, lambda0, alpha=1.14, lambda1=700.0):

ALG = np.log(1 - aerosol_asymmetry_factor) # Eq 3-14
BFS = ALG * (0.0783 + ALG * (-0.3824 - ALG * 0.5874)) # Eq 3-13
AFS = ALG * (1.459 + ALG * (0.1595 + ALG * 0.4129)) # Eq 3-12
cosZ = cosd(apparent_zenith)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate line

@kandersolar
Copy link
Member Author

FYI I updated the comparison notebook using the current HEAD of this branch. No significant changes from the original plots at the top of this thread. https://gist.github.com/kanderso-nrel/0dc7e7ebff7b5a9c13b52f8b48eb71ee

@adriesse
Copy link
Member

What's a good and convenient source for precipitable_water, ozone, aerosol_turbidity_500nm data? Preferably multiple values per day, USA and/or worldwide.

@mikofski
Copy link
Member

What's a good and convenient source for precipitable_water, ozone, aerosol_turbidity_500nm data? Preferably multiple values per day, USA and/or worldwide.

ECMWF, try get_ecmwf_macc

@adriesse
Copy link
Member

I will! There's a lot of stuff in iotools. Where have I been?

@wholmgren
Copy link
Member

ECMWF, try get_ecmwf_macc

plus angstrom_aod_at_lambda

Are we ready to merge this?

@kandersolar
Copy link
Member Author

There seems to be little benefit to putting the constants into a structured array. How about 5 constant column vectors like this?

I never responded to @adriesse's last suggestion -- I had the vague notion that someone might appreciate the constants already being in tabular form so that they can easily export it for another purpose. Other than that, you're right that columns only get used individually, so not much need for a structured array. For this and using np.outer instead of implicit broadcasting, I don't really feel strongly either way and have a mild preference for wrapping this up, so +1 to merge from me. But happy to make continue making changes if anyone else feels differently.

@wholmgren
Copy link
Member

I liked the structured array when I first read the code, but I also see @adriesse's point. I don't think there's a wrong choice here so I'm going to go ahead and merge. Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants