Skip to content

[ENH] Switch regressions to use the pandas implementation #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

twiecki
Copy link
Contributor

@twiecki twiecki commented Jul 16, 2015

There were several different methods to calculate regressions
being used. We should use a consistent method everywhere
and Pandas does fast rolling regressions with all sorts of
bells and whistles.

There were several different methods to calculate regressions
being used. We should use a consistent method everywhere
and Pandas does fast rolling regressions with all sorts of
bells and whistles.
@humdings
Copy link
Contributor

I have some questions/concerns about our regression calculations before I do more work on this.

  1. I am thinking we should return fitted ols objects rather than just the parameters everywhere. There's a lot of information we are throwing out by returning the parameters alone. Maybe we can break regression into a single function that is called by all the others to make sure we alway have access to the original object. Does anybody have an opinion on this.
  2. We seen to use/return an intercept in some regression cases but not others. Are there any cases where we do not want to use intercepts to perform a regression?
  3. I think 2 has caused some test cases to break or be written incorrectly, I plan to fix them up here as well.

@twiecki
Copy link
Contributor Author

twiecki commented Jul 23, 2015

@humdings I agree with all your points. My only concern regarding 1. is usability and principle of least surprise (and returning an object rarely is not surprising). Maybe there's a way to have both though but I can't think of something I like..

David Edwards added 3 commits July 23, 2015 12:15
I think there are still some inconsistencies in some of the tests
here with regards to how we are treating constant terms in
regressions.
I created two new regression calculations
 - regression
 - rolling_regression

These two functions return the actual ols object so functions
can have access to the other metrics the object contains.

I also added a 'return_ols_obj' kwarg to functions that
call the regression funcs. If it is True, the ols object is
also returned. This is for the same reason as above.
@gusgordon
Copy link
Contributor

Are we going to get this in for the release?

@twiecki
Copy link
Contributor Author

twiecki commented Jul 27, 2015

I think this one is optional.

In favor of simplicity I removed the option to return
the ols model from the calculations that perform
regressions. e.g. It is simpler if the beta function just
returns the beta coeffecients.

All of the regression calculations still call functions
that return the underlying model as I think it will be
important to be able to access other model data in the future.
@humdings
Copy link
Contributor

bump!

Bringing this back up. I think we can include this in the initial ship. It's a pretty simple change that
just standardizes the functions used to perform regressions.

I removed the kwarg to return the model from each of the functions, but I do think having functions
available that do return the model are important.

This still requires test rewrites which is why travis fails.

@gusgordon
Copy link
Contributor

OK, a lot of flake8 errors here have been fixed, and there are going to be merge conflicts. Maybe it makes sense to reset this branch and just make the changes necessary from there? Let me know if you want me to do anything, too.

@twiecki
Copy link
Contributor Author

twiecki commented Jul 27, 2015

@gusgordon That's probably the best way forward.

@matthewgilbert
Copy link

Just thought I would add this in pandas-dev/pandas#6077. At some point pandas hopes to move the stats stuff to statsmodels and deprecate it, although as of now I don't believe there is any fast rolling or expanding OLS stuff in statsmodels.

@twiecki
Copy link
Contributor Author

twiecki commented Aug 25, 2015

Thanks @matthewgilbert, good to know.

@twiecki
Copy link
Contributor Author

twiecki commented Oct 7, 2015

Closing due to inactivity. Hopefully we can implement this at a later point.

@twiecki twiecki closed this Oct 7, 2015
@gusgordon gusgordon deleted the pandas-ols branch November 3, 2015 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants