Skip to content

[ENH] Fama/French #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 29, 2015
Merged

[ENH] Fama/French #56

merged 1 commit into from
Aug 29, 2015

Conversation

0x0L
Copy link
Contributor

@0x0L 0x0L commented Aug 14, 2015

  • Added get_datasets_famafrench (requires bs4) : list all available datasets from FF
  • Complete rewrite of get_data_famafrench. Can grab and parse all (so called) csv files from FF website. Also uses metadata in the files to generate documentation (see DESCR attribute)
  • Moved tests for FF to their own file (One file per datareader #59)

API change : indexes are now pandas.PeriodIndex for annual and monthly data, and pandas.DatetimeIndex otherwise.

Should fix #55 #20 #21 see also pandas-dev/pandas#8842

@0x0L 0x0L mentioned this pull request Aug 14, 2015
@0x0L 0x0L force-pushed the famafrench branch 4 times, most recently from a7ca65c to b69c110 Compare August 15, 2015 03:33
except: pass
try: return dt.datetime.strptime(x, '%Y%m')
except: pass
return to_datetime(x)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks terrible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of any other way to do that unless we import dateutil.

@0x0L 0x0L force-pushed the famafrench branch 2 times, most recently from 4d0f191 to 665a1c7 Compare August 15, 2015 14:06
root = BeautifulSoup(socket.read(), 'html.parser')

l = filter(lambda x: x.startswith(_FF_PREFIX) and x.endswith(_FF_SUFFIX),
[e.attrs['href'] for e in root.findAll('a') if 'href' in e.attrs])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any substitute for findAll which does not involve any import ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a dependency on lxml, so maybe use that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this gist out:
https://gist.github.com/yejianye/5895954

Also, we could pull the Options._parse_url method out into a common function, as it does the importing and error checking for lxml.

@davidastephens davidastephens added this to the 0.2.0 milestone Aug 23, 2015
@0x0L
Copy link
Contributor Author

0x0L commented Aug 23, 2015

@davidastephens done

@davidastephens
Copy link
Member

Thanks. We will need to rebase/squash after #69 is merged, but other than that, looks good to me.

@davidastephens
Copy link
Member

#69 merged, can you rebase?

@0x0L
Copy link
Contributor Author

0x0L commented Aug 24, 2015

Done

I also added a test which fails on python 2.6.
It seems like it's a bug in pandas 0.13.1: docstring claims period inference should work but it doesn't

>>> df[0].to_period()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/python/2.6.9/lib/python2.6/site-packages/pandas/core/frame.py", line 4258, in to_period
    freq = self.index.freqstr or self.index.inferred_freq
  File "/opt/python/2.6.9/lib/python2.6/site-packages/pandas/tseries/index.py", line 1393, in freqstr
    return self.offset.freqstr
AttributeError: 'NoneType' object has no attribute 'freqstr'
>>> df[0].to_period('M')
         Mkt-RF   SMB    HML    RF
1926-07    2.96 -2.30  -2.87  0.22
1926-08    2.64 -1.40   4.19  0.25
1926-09    0.36 -1.32   0.01  0.23
1926-10   -3.24  0.04   0.51 ...
...
>>> print df[0].to_period.__doc__

        Convert DataFrame from DatetimeIndex to PeriodIndex with desired
        frequency (inferred from index if not passed)

        Parameters
        ----------
        freq : string, default
        axis : {0, 1}, default 0
            The axis to convert (the index by default)
        copy : boolean, default True
            If False then underlying input data is not copied

        Returns
        -------
        ts : TimeSeries with PeriodIndex

@davidastephens
Copy link
Member

@jreback @jorisvandenbossche Thoughts?

Need to keep supporting pandas 13.1 or move that one to an 'allowed failure'?

* add get_available_datasets (requires lxml)
* complete rewrite of get_data_famafrench
@0x0L
Copy link
Contributor Author

0x0L commented Aug 28, 2015

@davidastephens i wrote a workaround for 13.1
(tests pass for famafrench, failed on Fred on HTTP 503 error)

@davidastephens
Copy link
Member

Thanks!

davidastephens added a commit that referenced this pull request Aug 29, 2015
ENH: Fama/French re-write.
@davidastephens davidastephens merged commit f7b92a2 into pydata:master Aug 29, 2015
@0x0L 0x0L deleted the famafrench branch October 3, 2015 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Fama/French
2 participants