-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Gh9010 yahoo options parsing bug #9024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fix 9010, where the Options class will have problems parsing the underlying price as a float when the price is large enough to be rendered with a thousands separator (i.e., X,XXX.XX). I added a utility method, _string_to_float, that sets the locale to English/U.S., uses locale's atof function to turn the text into a float, and then resets the locale to the original locale. Yahoo appears to always render its numeric data in the English/U.S. format, regardless of the default locale settings. If that changes, then the utility method should instead set the locale to the appropriate setting.
@@ -672,6 +673,20 @@ def _yahoo_url_from_expiry(self, expiry): | |||
|
|||
return self._FINANCE_BASE_URL + expiry_links[expiry] | |||
|
|||
@staticmethod | |||
def _string_to_float(string): | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much to do float(value) and catch the exception (where u can then substitute out the ,)
no need to do any locale stuff
even better is to construct the frame and then try to coerce the columns with a try except around astype If it fails then do the replace and try again - will be much faster |
So you would prefer a solution where I leave Options's underlying_price member as a string, then in the _process_data method do an astype conversion and filter out the comma if an exception is thrown? |
yep will be more efficient |
Correcting the column type in _process_data won't work -- it breaks get_near_stock_price(), because its call into chop_data() relies on underlying_price already being a float, and doesn't use _process_data. It also breaks the test_sample_page_price_quote_time1 and test_sample_page_price_quote_time2 tests, for the same reasons. Doing the type checking / possible conversion in the _get_underlying_price() method will make the code cleaner and more cohesive, versus sprinkling around the checking and conversion login in multiple methods. I understand your performance concerns, especially about the locale methods ... but wouldn't simply filtering out the commas be nearly as quick? Let me know your thoughts. |
@@ -698,7 +713,7 @@ def _option_frames_from_url(self, url): | |||
|
|||
def _get_underlying_price(self, url): | |||
root = self._parse_url(url) | |||
underlying_price = float(root.xpath('.//*[@class="time_rtq_ticker Fz-30 Fw-b"]')[0]\ | |||
underlying_price = self._string_to_float(root.xpath('.//*[@class="time_rtq_ticker Fz-30 Fw-b"]')[0]\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a try: except around the conversion, if it hits the excpet try a comma subtitute and float conversion, if THAT fails then mark as NaN and move on
Per a suggestion from jreback, replaced the locale.atof conversion code with a more performant version that just removed comma thousands separators from the Options class's _get_underlying_price method.
@@ -1192,6 +1203,7 @@ def _process_data(self, frame, type): | |||
frame["Quote_Time"] = np.nan | |||
frame.rename(columns={'Open Int': 'Open_Int'}, inplace=True) | |||
frame['Type'] = type | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simpler
In [1]: '1,000'.replace(',','')
Out[1]: '1000'
I understand your point about simplicity, but string.replace() is deprecated in 2.x python and doesn't exist in python 3.x, and the pandas docs (http://pandas.pydata.org/pandas-docs/stable/install.html) claim to support both branches of python. String.join() works in both. |
Thanks for the report, sturmd. Is there any more housekeeping I need to do here to make sure this is ready for 0.16.0? |
@PollyP i.e. In [1]: s = 'fof'
In [2]: s.replace('o', 'f')
Out[2]: 'fff' will work fine in Python 3 but In [3]: import string
In [4]: s = 'fof'
In [5]: string.replace(s, 'o', 'f') won't. It's just some cruft they cleared up, the |
underlying_price = float(root.xpath('.//*[@class="time_rtq_ticker Fz-30 Fw-b"]')[0]\ | ||
.getchildren()[0].text) | ||
|
||
underlying_price = root.xpath('.//*[@class="time_rtq_ticker Fz-30 Fw-b"]')[0]\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this needs 1 more space to format properly
pls address the above, rebase and squash. ping when green. |
OK, will do this this weekend. On Fri, Jan 2, 2015 at 9:23 AM, jreback [email protected] wrote:
|
I wound up syncing and reapplying my changes to create a new PR #9198. I believe it is ready to go. |
replaced by #9198 |
closes #9010. Fix for Yahoo Options parse error for underlying prices of 1,000.00 or larger.