Skip to content

Fama-French multivariate regression #406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Aug 21, 2017
Merged

Fama-French multivariate regression #406

merged 24 commits into from
Aug 21, 2017

Conversation

eigenfoo
Copy link
Contributor

@eigenfoo eigenfoo commented Jul 31, 2017

Fixes #379

We don't just want a multivariate regression, we want a rolling multivariate regression. Pandas used to support this sort of thing with pd.stats.ols.MovingOLS, but that has unfortunately been deprecated.

A solution is described on StackOverflow, but frustratingly this solution doesn't work since apply only works on Series data (see citynorman's comment on the top answer). So, we will have to write our own rolling multivariate regression...

@eigenfoo eigenfoo added the bug label Jul 31, 2017
@eigenfoo
Copy link
Contributor Author

@twiecki @gusgordon the Python 3.4 build seems very sad without the statsmodels module

  File "/home/travis/build/quantopian/pyfolio/pyfolio/timeseries.py", line 24, in <module>
    import statsmodels.formula.api as sm
ImportError: No module named 'statsmodels'

regression_df.index[rolling_window:]):
window = regression_df.loc[beg:end]
coeffs = sm.ols(formula='rets ~ SMB + HML + UMD - 1', data=window) \
.fit().params.values
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also want to make sure that this computation is correct. Are these parameters the Fama French betas?

The -1 in the formula keyword means to set the intercept equal to 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not force the intercept to be 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@eigenfoo eigenfoo requested a review from twiecki July 31, 2017 20:07
@twiecki
Copy link
Contributor

twiecki commented Jul 31, 2017

Yes, we need to add statsmodels to the requirements file.

@eigenfoo
Copy link
Contributor Author

eigenfoo commented Aug 1, 2017

Added statsmodels to requirements file and fixed bug forcing regression intercept to be 0. The Travis builds seem to be passing, although the Python 3.4 build timed out: I'll try and implement this solution tomorrow morning.

Other than that, the PR is ready for review and merge! @gusgordon @twiecki

@eigenfoo
Copy link
Contributor Author

eigenfoo commented Aug 1, 2017

@twiecki the Travis builds are not failing, but are timing out. See here for an example. I've tried implementing a solution using travis_wait, but that does not seem to help: builds still time out. @gusgordon and I don't really know what's going on. Any thoughts?

@twiecki
Copy link
Contributor

twiecki commented Aug 1, 2017

@georgh0021 how slow is the regression when you try it locally?

@eigenfoo
Copy link
Contributor Author

eigenfoo commented Aug 1, 2017

@twiecki

In [13]: %timeit rolling_fama_french(returns, factor_returns)
         1 loop, best of 3: 4.52 s per loop

If performance is an issue, it may be worth looking into Pythonic's answer to this forum post. He implements a numpy-only solution with linear algebra...

In [14]: %timeit rolling_regression(factor_returns.iloc[:, 0], returns)
         10 loops, best of 3: 61.8 ms per loop

@twiecki
Copy link
Contributor

twiecki commented Aug 2, 2017

4.52 secs is quite slow for this simple functionality. How long a time-range did you check this on? The numpy version is not ideal but perhaps our best shot if we want to keep the functionality.

@twiecki
Copy link
Contributor

twiecki commented Aug 2, 2017

Why do you test on only one factor? factor_returns.iloc[:, 0],

@eigenfoo
Copy link
Contributor Author

eigenfoo commented Aug 2, 2017

Laziness 😛 the code on StackOverflow only took in one variable, I just copied and pasted to see what it would look like. Will refactor and test today.

@twiecki
Copy link
Contributor

twiecki commented Aug 2, 2017

Not sure I trust that code. Have you tried sklearn.linear_model.LinearRegression?

@eigenfoo
Copy link
Contributor Author

eigenfoo commented Aug 2, 2017

@twiecki sklearn runs much faster!
1 loop, best of 3: 479 ms per loop
And you were right, the regression matches the regression done by statsmodels, while the numpy solution was weird and incorrect.

Travis can't seem to find the sklearn package though. Any ideas?

PackageNotFoundError: Packages missing in current channels:
            
  - sklearn
We have searched for the packages in the following channels:
            
  - https://repo.continuum.io/pkgs/free/linux-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/linux-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/linux-64
  - https://repo.continuum.io/pkgs/pro/noarch
            
The command "conda create -q -n testenv --yes python=$TRAVIS_PYTHON_VERSION ipython pyzmq numpy scipy nose matplotlib pandas Cython patsy flake8 seaborn sklearn runipy pytables networkx pandas-datareader matplotlib-tests joblib" failed and exited with 1 during .
Your build has been stopped.

@twiecki
Copy link
Contributor

twiecki commented Aug 2, 2017

Great! it's conda install scikit-learn

@eigenfoo eigenfoo requested a review from gusgordon August 2, 2017 19:18
@eigenfoo
Copy link
Contributor Author

eigenfoo commented Aug 2, 2017

@twiecki @gusgordon back to square one: Python 3.4 build is still timing out. More help needed, unfortunately.

@eigenfoo eigenfoo mentioned this pull request Aug 14, 2017
4 tasks
@eigenfoo eigenfoo added this to the v1.0.0 milestone Aug 15, 2017
@eigenfoo
Copy link
Contributor Author

@richafrank @twiecki bump. No idea why the tests are timing out.

@twiecki
Copy link
Contributor

twiecki commented Aug 21, 2017

Hm, seems like 3.4 and 3.5 are pretty close to the limit. At this point we can probably drop 3.4 all-together.

@twiecki twiecki merged commit 17bbfe3 into master Aug 21, 2017
@twiecki twiecki deleted the ff_multivar branch August 21, 2017 09:39
@eigenfoo
Copy link
Contributor Author

Not sure why the builds aren't timing out now... Something to keep in mind going forward I suppose. Once this becomes a serious problem we can look into finding a solution.

@twiecki
Copy link
Contributor

twiecki commented Aug 21, 2017

Well, I think it's right on the razor's edge. We should probably just test over a shorter time-period.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants