Gh9010 yahoo options parsing bug #9024

PollyP · 2014-12-06T19:52:43Z

closes #9010. Fix for Yahoo Options parse error for underlying prices of 1,000.00 or larger.

Fix 9010, where the Options class will have problems parsing the underlying price as a float when the price is large enough to be rendered with a thousands separator (i.e., X,XXX.XX). I added a utility method, _string_to_float, that sets the locale to English/U.S., uses locale's atof function to turn the text into a float, and then resets the locale to the original locale. Yahoo appears to always render its numeric data in the English/U.S. format, regardless of the default locale settings. If that changes, then the utility method should instead set the locale to the appropriate setting.

jreback · 2014-12-06T19:56:18Z

pandas/io/data.py

@@ -672,6 +673,20 @@ def _yahoo_url_from_expiry(self, expiry):

        return self._FINANCE_BASE_URL + expiry_links[expiry]

+    @staticmethod
+    def _string_to_float(string):
+        """


much to do float(value) and catch the exception (where u can then substitute out the ,)
no need to do any locale stuff

jreback · 2014-12-06T19:57:31Z

even better is to construct the frame and then try to coerce the columns with a try except around astype If it fails then do the replace and try again - will be much faster

PollyP · 2014-12-06T20:17:31Z

So you would prefer a solution where I leave Options's underlying_price member as a string, then in the _process_data method do an astype conversion and filter out the comma if an exception is thrown?

jreback · 2014-12-06T20:19:17Z

yep will be more efficient

PollyP · 2014-12-06T22:21:56Z

Correcting the column type in _process_data won't work -- it breaks get_near_stock_price(), because its call into chop_data() relies on underlying_price already being a float, and doesn't use _process_data. It also breaks the test_sample_page_price_quote_time1 and test_sample_page_price_quote_time2 tests, for the same reasons.

Doing the type checking / possible conversion in the _get_underlying_price() method will make the code cleaner and more cohesive, versus sprinkling around the checking and conversion login in multiple methods. I understand your performance concerns, especially about the locale methods ... but wouldn't simply filtering out the commas be nearly as quick?

Let me know your thoughts.

jreback · 2014-12-07T00:22:09Z

pandas/io/data.py

@@ -698,7 +713,7 @@ def _option_frames_from_url(self, url):

    def _get_underlying_price(self, url):
        root = self._parse_url(url)
-        underlying_price = float(root.xpath('.//*[@class="time_rtq_ticker Fz-30 Fw-b"]')[0]\
+        underlying_price = self._string_to_float(root.xpath('.//*[@class="time_rtq_ticker Fz-30 Fw-b"]')[0]\


use a try: except around the conversion, if it hits the excpet try a comma subtitute and float conversion, if THAT fails then mark as NaN and move on

Per a suggestion from jreback, replaced the locale.atof conversion code with a more performant version that just removed comma thousands separators from the Options class's _get_underlying_price method.

jreback · 2014-12-10T11:18:19Z

pandas/io/data.py

@@ -1192,6 +1203,7 @@ def _process_data(self, frame, type):
            frame["Quote_Time"] = np.nan
        frame.rename(columns={'Open Int': 'Open_Int'}, inplace=True)
        frame['Type'] = type
+


This is simpler

In [1]: '1,000'.replace(',','') Out[1]: '1000'

PollyP · 2014-12-10T17:38:38Z

I understand your point about simplicity, but string.replace() is deprecated in 2.x python and doesn't exist in python 3.x, and the pandas docs (http://pandas.pydata.org/pandas-docs/stable/install.html) claim to support both branches of python. String.join() works in both.

PollyP · 2014-12-29T19:22:59Z

Thanks for the report, sturmd.

Is there any more housekeeping I need to do here to make sure this is ready for 0.16.0?

Komnomnomnom · 2014-12-30T21:07:04Z

@PollyP replace was removed from the string helper module but still exists on the string type

i.e.

In [1]: s = 'fof'

In [2]: s.replace('o', 'f')
Out[2]: 'fff'

will work fine in Python 3 but

In [3]: import string

In [4]: s = 'fof'

In [5]: string.replace(s, 'o', 'f')

won't. It's just some cruft they cleared up, the replace method has not been removed or deprecated on the string type.

jreback · 2015-01-02T17:21:40Z

pandas/io/data.py

-        underlying_price = float(root.xpath('.//*[@class="time_rtq_ticker Fz-30 Fw-b"]')[0]\
-            .getchildren()[0].text)
+
+	underlying_price = root.xpath('.//*[@class="time_rtq_ticker Fz-30 Fw-b"]')[0]\


looks like this needs 1 more space to format properly

jreback · 2015-01-02T17:22:49Z

@PollyP

pls address the above, rebase and squash. ping when green.

PollyP · 2015-01-02T23:25:29Z

OK, will do this this weekend.

On Fri, Jan 2, 2015 at 9:23 AM, jreback [email protected] wrote:

@PollyP https://github.com/PollyP

pls address the above, rebase and squash. ping when green.

—
Reply to this email directly or view it on GitHub
#9024 (comment).

PollyP · 2015-01-05T04:01:29Z

I wound up syncing and reapplying my changes to create a new PR #9198. I believe it is ready to go.

jreback · 2015-01-05T23:51:07Z

replaced by #9198

PollyP added 3 commits December 5, 2014 19:36

BUG: GH9010 updated Changelog with bug fix info

e38c7e8

BUG: GH9010: Add a _get_underlying_price test to Yahoo Options

4956c0a

jreback reviewed Dec 6, 2014
View reviewed changes

jreback added Bug Data Reader labels Dec 6, 2014

jreback reviewed Dec 7, 2014
View reviewed changes

performance improvement on BUG GH9010

b461ae2

Per a suggestion from jreback, replaced the locale.atof conversion code with a more performant version that just removed comma thousands separators from the Options class's _get_underlying_price method.

jreback reviewed Dec 10, 2014
View reviewed changes

jreback added this to the 0.16.0 milestone Dec 13, 2014

jreback reviewed Jan 2, 2015
View reviewed changes

PollyP mentioned this pull request Jan 5, 2015

BUG: fix for GH9010 #9198

Closed

jreback closed this Jan 5, 2015

Uh oh!

Gh9010 yahoo options parsing bug #9024

Gh9010 yahoo options parsing bug #9024

Uh oh!

Conversation

PollyP commented Dec 6, 2014

Uh oh!

jreback Dec 6, 2014

Choose a reason for hiding this comment

Uh oh!

jreback commented Dec 6, 2014

Uh oh!

PollyP commented Dec 6, 2014

Uh oh!

jreback commented Dec 6, 2014

Uh oh!

PollyP commented Dec 6, 2014

Uh oh!

jreback Dec 7, 2014

Choose a reason for hiding this comment

Uh oh!

jreback Dec 10, 2014

Choose a reason for hiding this comment

Uh oh!

PollyP commented Dec 10, 2014

Uh oh!

PollyP commented Dec 29, 2014

Uh oh!

Komnomnomnom commented Dec 30, 2014

Uh oh!

jreback Jan 2, 2015

Choose a reason for hiding this comment

Uh oh!

jreback commented Jan 2, 2015

Uh oh!

PollyP commented Jan 2, 2015

Uh oh!

PollyP commented Jan 5, 2015

Uh oh!

jreback commented Jan 5, 2015

Uh oh!

Uh oh!