Skip to content

Propose more robust datetime to unix time conversion in yahoo/daily.py #378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shipmints opened this issue Aug 4, 2017 · 13 comments
Open

Comments

@shipmints
Copy link

Apologies as I'm new to git and github. I can try to make a fork/patch/pull request but I have to figure out how. In the meantime, I thought I'd offer the following to the local experts who surely can do this an order of magnitude faster than I can...

    def _get_params(self, symbol):
# I propose changing
        unix_start = int(time.mktime(self.start.timetuple()))
# to
        unix_start = (start - datetime.datetime(1970, 1, 1)).total_seconds()
# as it avoids mktime overflows
# e.g., try running the code with 1900/01/01 as the start range
@shipmints
Copy link
Author

Corrected snippet to include self...

unix_start = (self.start - datetime.datetime(1970, 1, 1)).total_seconds()

@stoffprof
Copy link
Contributor

stoffprof commented Aug 24, 2017

This would resolve the error one gets when trying to pull data pre-1970 (on a Windows machine):

stdt = datetime(1950,1,1)
df = web.DataReader('^GSPC', 'yahoo', stdt)

     98     def _get_params(self, symbol):
---> 99         unix_start = int(time.mktime(self.start.timetuple()))
    100         day_end = self.end.replace(hour=23, minute=59, second=59)
    101         unix_end = int(time.mktime(day_end.timetuple()))

OverflowError: mktime argument out of range 

I think this would also work:

unix_start = int(calendar.timegm((self.start.timetuple()))

@femtotrader
Copy link
Contributor

femtotrader commented Aug 24, 2017

@shipmints
Be aware that you can't substract timezone aware datetime and timezone naive datetime
So it may fail if self.start is a timezone aware datetime (because your epoch is datetime.datetime(1970, 1, 1) which is a timezone naive datetime)

EPOCH should in fact be datetime.datetime(1970, 1, 1, tzinfo=pytz.UTC)

so we may wonder what will happen if self.start is timezone aware but not UTC?

What ever change in pandas-datareader code will be... we should have _sanitize_dates function in mind

def _sanitize_dates(start, end):

Calculating difference between 2 datetime with differents timezone (according DST or not ...) may be a bit tricky (but some Python library can do this)

With Python 3.6 and Pandas 0.20 there is a very convenient method:

pd.Timestamp('1970-01-01', tz='UTC').timestamp()

unfortunately not with Python 2.7 and same Pandas version

@femtotrader
Copy link
Contributor

A workaround may be to define (probably in _utils.py) this function

def _timestamp(ts):
    """
    Return unix timestamp for Pandas Timestamp ts
    See https://github.com/pandas-dev/pandas/issues/17329
    """
    try:
        return ts.timestamp()
    except AttributeError:
        return ts.value // 1000 / 1000000.0

With Python 3.6 (same result with Python 2.7)

In [161]: _timestamp(pd.Timestamp("1970-1-1", tz="UTC"))
Out[161]: 0.0

In [162]: _timestamp(pd.Timestamp("1970-1-1", tz="US/Central"))
Out[162]: 21600.0

In [163]: _timestamp(pd.Timestamp("1970-1-1", tz="Europe/Paris"))
Out[163]: -3600.0

@femtotrader
Copy link
Contributor

femtotrader commented Aug 25, 2017

but the problem of this code is that it returns a different value when no timezone is given

With Python 3.6 (my system timezone is Europe/Paris)

In [167]: _timestamp(pd.Timestamp("1970-1-1"))
Out[167]: -3600.0

with Python 2.7

In [43]: _timestamp(pd.Timestamp("1970-1-1"))
Out[43]: 0.0

Pinging @chris-b1 @gfyoung

@femtotrader
Copy link
Contributor

femtotrader commented Aug 25, 2017

Maybe

def _timestamp(ts):
    """
    Return unix timestamp for Timestamp ts
    See https://github.com/pandas-dev/pandas/issues/17329
    """
    try:
        return ts.timestamp()
    except AttributeError:
        timestamp = ts.value // 1000 / 1000000.0
        if ts.tz is None:
             timestamp += time.timezone
        return timestamp


In [36]: _timestamp(pd.Timestamp("1970-1-1", tz="UTC"))
Out[36]: 0.0

In [37]: _timestamp(pd.Timestamp("1970-1-1", tz="US/Central"))
Out[37]: 21600.0

In [38]: _timestamp(pd.Timestamp("1970-1-1", tz="Europe/Paris"))
Out[38]: -3600.0

In [69]: _timestamp(pd.Timestamp("1970-1-1"))  # use local timezone
Out[69]: -3600.0

@stoffprof
Copy link
Contributor

@jreback Should this be considered a bug? Can't download pre-1970 data from Yahoo even when it is available (on Windows machines).

@jreback
Copy link
Contributor

jreback commented Aug 28, 2017

the patch by @femtotrader looks fine if u want to submit a PR with tests

@bashtage
Copy link
Contributor

Yahoo has been deprecated.

@stoffprof
Copy link
Contributor

I don't think this was ever implemented in yahoo, which is no longer deprecated. I can do a PR if needed. @femtotrader - were you suggesting that that _timestamp() be called within _sanitize_dates() on start and end and then using @shipmints solution of

unix_start = (start - datetime.datetime(1970, 1, 1)).total_seconds()

@bashtage - thoughts?

@stoffprof
Copy link
Contributor

To be clear,

 pdr.get_data_yahoo('^GSPC', '1965')

raises a mktime argument out of range exception on Windows but not Mac.

Also, since yahoo only returns daily data, how important is the time zone issue? Seems like tz could safely be ignored.

@bashtage
Copy link
Contributor

Doesn't seem like a high priority issue, but if you have some time and want to fix...

@bashtage bashtage reopened this Mar 20, 2019
@gliptak
Copy link
Contributor

gliptak commented Sep 22, 2019

Was this improved with #703 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants