Skip to content

ER: Better error reporting in get_data_yahoo #4025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wesm opened this issue Jun 25, 2013 · 20 comments
Closed

ER: Better error reporting in get_data_yahoo #4025

wesm opened this issue Jun 25, 2013 · 20 comments
Labels
Error Reporting Incorrect or improved errors from pandas
Milestone

Comments

@wesm
Copy link
Member

wesm commented Jun 25, 2013

reported by a book reader

When processing a long list of symbols using get_data_yahoo it is essential
that you be able to see which symbol is causing the problem.  I was
importing a supposedly cleaned list of all the Russell 2000 stocks.  When I
used your example (page 139), it worked.  When I used my 'short list' it
worked.  When I used my full list is failed: HTTP Error 404: Not Found.
The only way to find the error was to add the stocks one by one until the
error occurred.  So what we need is something that kicks out a list of 'bad
symbols'.  When I looked at your code it seemed that you had an attempt at
that:

  in def dl_mult_symbols(symbol), i see:

      try:
         stocks[sym] = etc.
      except:
          warnings.warn('Error with sym: ' + sym + ' ...skipping'

But this is not being invoked.  All I am getting is the 404 warning with no
indication of which stock in the list caused the error.  I am not familiar
enough (or good enough) to read the code and make the necessary
corrections.

When I added the symbol 'ABVT' I got an error.  I checked it out:  ABVT was
AboveNet, Inc. which is now Zayo Group  But obviously we need something
which kicks out a list of bad symbols, but keeps loading the good ones.

Below I give the traceback from trying to call ABVT.

 I am making a list of errors in the book (have only found a few) which I
will be glad to provide at any point.


HTTPError                                 Traceback (most recent call last)
<ipython-input-110-daf6874a2577> in <module>()
----> 1 web.get_data_yahoo('ABVT','1/1/2013','3/1/2013')

/Users/GJA/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/io/data.pyc
in get_data_yahoo(symbols, start, end, retry_count, pause, adjust_price,
ret_index, chunksize, **kwargs)
    328     if isinstance(symbols, (str, int)):
    329         sym = symbols
--> 330         hist_data = _get_hist_yahoo(sym, start=start, end=end)
    331     #Or multiple symbols, (e.g., ['GOOG', 'AAPL', 'MSFT'])
    332     elif isinstance(symbols, DataFrame):

/Users/GJA/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/io/data.pyc
in _get_hist_yahoo(sym, start, end, retry_count, pause, **kwargs)
    160
    161     for _ in range(retry_count):
--> 162         resp = urllib2.urlopen(url)
    163         if resp.code == 200:
    164             lines = resp.read()

/Applications/Canopy.app/appdata/canopy-1.0.0.1160.macosx-x86_64/Canopy.app/Contents/lib/python2.7/urllib2.pyc
in urlopen(url, data, timeout)
    124     if _opener is None:
    125         _opener = build_opener()
--> 126     return _opener.open(url, data, timeout)
    127
    128 def install_opener(opener):

/Applications/Canopy.app/appdata/canopy-1.0.0.1160.macosx-x86_64/Canopy.app/Contents/lib/python2.7/urllib2.pyc
in open(self, fullurl, data, timeout)
    404         for processor in self.process_response.get(protocol, []):
    405             meth = getattr(processor, meth_name)
--> 406             response = meth(req, response)
    407
    408         return response

/Applications/Canopy.app/appdata/canopy-1.0.0.1160.macosx-x86_64/Canopy.app/Contents/lib/python2.7/urllib2.pyc
in http_response(self, request, response)
    517         if not (200 <= code < 300):
    518             response = self.parent.error(
--> 519                 'http', request, response, code, msg, hdrs)
    520
    521         return response

/Applications/Canopy.app/appdata/canopy-1.0.0.1160.macosx-x86_64/Canopy.app/Contents/lib/python2.7/urllib2.pyc
in error(self, proto, *args)
    442         if http_err:
    443             args = (dict, 'default', 'http_error_default') +
orig_args
--> 444             return self._call_chain(*args)
    445
    446 # XXX probably also want an abstract factory that knows when it
makes

/Applications/Canopy.app/appdata/canopy-1.0.0.1160.macosx-x86_64/Canopy.app/Contents/lib/python2.7/urllib2.pyc
in _call_chain(self, chain, kind, meth_name, *args)
    376             func = getattr(handler, meth_name)
    377
--> 378             result = func(*args)
    379             if result is not None:
    380                 return result

/Applications/Canopy.app/appdata/canopy-1.0.0.1160.macosx-x86_64/Canopy.app/Contents/lib/python2.7/urllib2.pyc
in http_error_default(self, req, fp, code, msg, hdrs)
    525 class HTTPDefaultErrorHandler(BaseHandler):
    526     def http_error_default(self, req, fp, code, msg, hdrs):
--> 527         raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
    528
    529 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: Not Found


ref:_00D708xG6._50070UAdZw:ref
@jtratner
Copy link
Contributor

This is similar to the issue cropping up in other tests because some of the other network tests silently skip symbols that error. E.g. #4029, #4028.

How should pandas handle getting multiple symbols? Maybe they should mirror read_csv and take an error_bad_lines-like option (+ option to just emit warnings). Then definitely could catch any errors and concatenate them with the symbol, e.g. something like

try:
    some_get_data_func(symbol)
except IOError as e: #IOError covers most/all network errors, possibly need OSError too
    msg = "Could not get information for %s, failure was: %r" % (symbol, error)
    if error_bad_lines:
       raise IOError(msg)
    elif warn_bad_lines:
       warnings.warn(msg)
    else:
       errors.append((symbol, msg))

@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2013

why not just raise an error? i'm not sure about "kicking out" bad symbols since after all they are invalid. throw an error telling which one was bad...

@jtratner
Copy link
Contributor

well, what happens if you have 200 symbols (or symbols that you're getting from elsewhere) and the user wants to choose how to handle it? (e.g., just ignore symbols that fail) Would you have to call it twice? only use one symbol at a time?

@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2013

i just feel weird about the warn + error options. i think having the error option is a good idea but not warnings, because e.g., if 100 out of 200 symbols generate warnings that is not very useful

@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2013

what about returning not found symbols?

@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2013

more interactive that way you can visually inspect rather than having to run things a bunch of times to see the warnings

@jtratner
Copy link
Contributor

what or how would you return something? (given that the output result is a dataframe with the data)

@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2013

have a return_bad_symbols option that will return a (frame, list) when True otherwise just return the frame.

@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2013

empty list if True and no bad symbols, etc. details to iron out for whoever decides to fix this. i can do it in clean up data.py ...

@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2013

@jtratner @jreback @wesm what do u think about that API?

@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2013

this could be done in 0.12

@jreback
Copy link
Contributor

jreback commented Jun 25, 2013

no......very easy to fix this; if a symbol is bad then then data is simply nan very consistent that way

@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2013

hm yes. not sure why i didn't think about that. ok then

@jreback
Copy link
Contributor

jreback commented Jun 25, 2013

remember GIGO; pandas is not to determine what is good or bad, the service does that

@nehalecky
Copy link
Contributor

@jreback, the nan option is a perfect fix. Sorry about the confusion with this one, I had meant to include handling non-existent symbols when I originally submitted the PR for this feature, but didn't get around to it.

@jtratner
Copy link
Contributor

@jreback good solution 👍

@nehalecky
Copy link
Contributor

I'll be busy today, but could try and get a PR submitted for later today if we want to try and include this in 11.1?

@jreback
Copy link
Contributor

jreback commented Jun 25, 2013

sure

@jreback
Copy link
Contributor

jreback commented Sep 27, 2013

@nehalecky PR on this?

@jreback
Copy link
Contributor

jreback commented Feb 14, 2014

closing as stale

@jreback jreback closed this as completed Feb 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

No branches or pull requests

5 participants