-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fama-French multivariate regression #406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@twiecki @gusgordon the Python 3.4 build seems very sad without the
|
pyfolio/timeseries.py
Outdated
regression_df.index[rolling_window:]): | ||
window = regression_df.loc[beg:end] | ||
coeffs = sm.ols(formula='rets ~ SMB + HML + UMD - 1', data=window) \ | ||
.fit().params.values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also want to make sure that this computation is correct. Are these parameters the Fama French betas?
The -1
in the formula
keyword means to set the intercept equal to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not force the intercept to be 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
Yes, we need to add statsmodels to the requirements file. |
Added Other than that, the PR is ready for review and merge! @gusgordon @twiecki |
@twiecki the Travis builds are not failing, but are timing out. See here for an example. I've tried implementing a solution using |
@georgh0021 how slow is the regression when you try it locally? |
If performance is an issue, it may be worth looking into Pythonic's answer to this forum post. He implements a numpy-only solution with linear algebra...
|
4.52 secs is quite slow for this simple functionality. How long a time-range did you check this on? The numpy version is not ideal but perhaps our best shot if we want to keep the functionality. |
Why do you test on only one factor? |
Laziness 😛 the code on StackOverflow only took in one variable, I just copied and pasted to see what it would look like. Will refactor and test today. |
Not sure I trust that code. Have you tried |
@twiecki Travis can't seem to find the sklearn package though. Any ideas?
|
Great! it's |
@twiecki @gusgordon back to square one: Python 3.4 build is still timing out. More help needed, unfortunately. |
@richafrank @twiecki bump. No idea why the tests are timing out. |
Hm, seems like 3.4 and 3.5 are pretty close to the limit. At this point we can probably drop 3.4 all-together. |
Not sure why the builds aren't timing out now... Something to keep in mind going forward I suppose. Once this becomes a serious problem we can look into finding a solution. |
Well, I think it's right on the razor's edge. We should probably just test over a shorter time-period. |
Fixes #379
We don't just want a multivariate regression, we want a rolling multivariate regression. Pandas used to support this sort of thing with
pd.stats.ols.MovingOLS
, but that has unfortunately been deprecated.A solution is described on StackOverflow, but frustratingly this solution doesn't work since
apply
only works on Series data (see citynorman's comment on the top answer). So, we will have to write our own rolling multivariate regression...