Skip to content

[PERF] Rolling Regressions #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
humdings opened this issue Jul 15, 2015 · 7 comments
Closed

[PERF] Rolling Regressions #58

humdings opened this issue Jul 15, 2015 · 7 comments

Comments

@humdings
Copy link
Contributor

I'd like to suggest we use Pandas implementation for rolling regressions.
https://github.com/pydata/pandas/blob/master/pandas/stats/ols.py

The performance difference is huge.

For a single factor

  • rolling_beta: 1 loops, best of 3: 4.26 s per loop
  • pd.ols: 1 loops, best of 3: 275 ms per loop

For multiple factors

  • rolling_multifactor_beta: 1 loops, best of 3: 15.2 s per loop
  • pd.ols: 1 loops, best of 3: 265 ms per loop

Pandas uses some sophisticated caching methods that make their implementation really fast. I'm slightly suspect of the differences between the results of the two implementations as well, I know the pandas method takes takes takes float precision into account.

If there are no arguments against this I can swap it out and submit a PR.

@justinlent
Copy link
Contributor

+1 @humdings

yeah, we should just go down the pandas route for everything that pandas supports since its very industry recognized/supported

@gusgordon
Copy link
Contributor

Yeah that sounds great! I am only concerned about the differences. How significant are they? May just be different methods of compounding returns or something.

@humdings
Copy link
Contributor Author

The differences are not too severe, but they are not negligible. I pushed to a 'pandas-ols' branch, you can check it out there, I want to make sure none of the actual tear sheets break before it hits the master.

@twiecki
Copy link
Contributor

twiecki commented Jul 15, 2015

@humdings great, thanks! can you do a PR?

@twiecki
Copy link
Contributor

twiecki commented Jul 16, 2015

Just looked at the code. Not only is it faster but also much more succinct. Should definitely merge this.

@twiecki
Copy link
Contributor

twiecki commented Jul 16, 2015

#63

@eigenfoo
Copy link
Contributor

eigenfoo commented Aug 1, 2017

Rolling regressions are now deprecated in pandas, and will be removed in a future version. I 100% agree with the point about performance between pd.ols and smf.ols, but statsmodels does not seem to be implementing a rolling ols any time soon... for now, we'll have to write our own rolling regressions, iterating over a rolling window. This is the problem and solution that #406 has also encountered.

Closing this issue due to this statsmodels deprecation issue.

@eigenfoo eigenfoo closed this as completed Aug 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants