Skip to content

Gh9010 yahoo options parsing bug #9024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Gh9010 yahoo options parsing bug #9024

wants to merge 4 commits into from

Conversation

PollyP
Copy link

@PollyP PollyP commented Dec 6, 2014

closes #9010. Fix for Yahoo Options parse error for underlying prices of 1,000.00 or larger.

Fix 9010, where the Options class will have problems parsing the underlying price as
a float when the price is large enough to be rendered with a thousands separator
(i.e., X,XXX.XX).

I added a utility method, _string_to_float, that sets the locale to English/U.S.,
uses locale's atof function to turn the text into a float, and then resets the
locale to the original locale.

Yahoo appears to always render its numeric data in
the English/U.S. format, regardless of the default locale settings.
If that changes, then the utility method should instead set the locale to the
appropriate setting.
@@ -672,6 +673,20 @@ def _yahoo_url_from_expiry(self, expiry):

return self._FINANCE_BASE_URL + expiry_links[expiry]

@staticmethod
def _string_to_float(string):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much to do float(value) and catch the exception (where u can then substitute out the ,)
no need to do any locale stuff

@jreback
Copy link
Contributor

jreback commented Dec 6, 2014

even better is to construct the frame and then try to coerce the columns with a try except around astype If it fails then do the replace and try again - will be much faster

@PollyP
Copy link
Author

PollyP commented Dec 6, 2014

So you would prefer a solution where I leave Options's underlying_price member as a string, then in the _process_data method do an astype conversion and filter out the comma if an exception is thrown?

@jreback
Copy link
Contributor

jreback commented Dec 6, 2014

yep will be more efficient

@PollyP
Copy link
Author

PollyP commented Dec 6, 2014

Correcting the column type in _process_data won't work -- it breaks get_near_stock_price(), because its call into chop_data() relies on underlying_price already being a float, and doesn't use _process_data. It also breaks the test_sample_page_price_quote_time1 and test_sample_page_price_quote_time2 tests, for the same reasons.

Doing the type checking / possible conversion in the _get_underlying_price() method will make the code cleaner and more cohesive, versus sprinkling around the checking and conversion login in multiple methods. I understand your performance concerns, especially about the locale methods ... but wouldn't simply filtering out the commas be nearly as quick?

Let me know your thoughts.

@@ -698,7 +713,7 @@ def _option_frames_from_url(self, url):

def _get_underlying_price(self, url):
root = self._parse_url(url)
underlying_price = float(root.xpath('.//*[@class="time_rtq_ticker Fz-30 Fw-b"]')[0]\
underlying_price = self._string_to_float(root.xpath('.//*[@class="time_rtq_ticker Fz-30 Fw-b"]')[0]\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a try: except around the conversion, if it hits the excpet try a comma subtitute and float conversion, if THAT fails then mark as NaN and move on

Per a suggestion from jreback, replaced the locale.atof conversion code with
a more performant version that just removed comma thousands separators from the
Options class's _get_underlying_price method.
@@ -1192,6 +1203,7 @@ def _process_data(self, frame, type):
frame["Quote_Time"] = np.nan
frame.rename(columns={'Open Int': 'Open_Int'}, inplace=True)
frame['Type'] = type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simpler

In [1]: '1,000'.replace(',','')
Out[1]: '1000'

@PollyP
Copy link
Author

PollyP commented Dec 10, 2014

I understand your point about simplicity, but string.replace() is deprecated in 2.x python and doesn't exist in python 3.x, and the pandas docs (http://pandas.pydata.org/pandas-docs/stable/install.html) claim to support both branches of python. String.join() works in both.

@jreback jreback added this to the 0.16.0 milestone Dec 13, 2014
@PollyP
Copy link
Author

PollyP commented Dec 29, 2014

Thanks for the report, sturmd.

Is there any more housekeeping I need to do here to make sure this is ready for 0.16.0?

@Komnomnomnom
Copy link
Contributor

@PollyP replace was removed from the string helper module but still exists on the string type

i.e.

In [1]: s = 'fof'

In [2]: s.replace('o', 'f')
Out[2]: 'fff'

will work fine in Python 3 but

In [3]: import string

In [4]: s = 'fof'

In [5]: string.replace(s, 'o', 'f')

won't. It's just some cruft they cleared up, the replace method has not been removed or deprecated on the string type.

underlying_price = float(root.xpath('.//*[@class="time_rtq_ticker Fz-30 Fw-b"]')[0]\
.getchildren()[0].text)

underlying_price = root.xpath('.//*[@class="time_rtq_ticker Fz-30 Fw-b"]')[0]\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this needs 1 more space to format properly

@jreback
Copy link
Contributor

jreback commented Jan 2, 2015

@PollyP

pls address the above, rebase and squash. ping when green.

@PollyP
Copy link
Author

PollyP commented Jan 2, 2015

OK, will do this this weekend.

On Fri, Jan 2, 2015 at 9:23 AM, jreback [email protected] wrote:

@PollyP https://github.com/PollyP

pls address the above, rebase and squash. ping when green.


Reply to this email directly or view it on GitHub
#9024 (comment).

@PollyP PollyP mentioned this pull request Jan 5, 2015
@PollyP
Copy link
Author

PollyP commented Jan 5, 2015

I wound up syncing and reapplying my changes to create a new PR #9198. I believe it is ready to go.

@jreback
Copy link
Contributor

jreback commented Jan 5, 2015

replaced by #9198

@jreback jreback closed this Jan 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Yahoo options DataReader can't parse underlying prices with commas
3 participants