-
Notifications
You must be signed in to change notification settings - Fork 400
Mismatched indexes in X,y esp. with sklearn pipelines #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'd be up for making a PR, but am new to this project. I think it might be nicest to add a function that converts/checks both
Thoughts? |
exactly, this is especially dangerous in a cross validate setting |
I realized that resetting index can solve the problem. |
Hi @bmreiniger thanks for pointing this issue out. If you still want to make a PR your help is much appreciated. |
@PaulWestenthanner I'll give it a shot, sure. And thanks for the heads up about X-only convert_input. |
Uh oh!
There was an error while loading. Please reload this page.
When
X
is a numpy array buty
is a pandas Series (which is the case e.g. whenX
was converted bysklearn
), theconvert_input...
functions called e.g. inhttps://github.com/bmreiniger/category_encoders/blob/a810a4b7abfce9fc4eb7fc401e3d37f2c1c6e402/category_encoders/target_encoder.py#L118
don't give the resulting pandas objects the same index. This causes
TargetEncoder
,WOEEncoder
,LeaveOneOutEncoder
,CatBoostEncoder
, andJamesSteinEncoder
(any others?) to miscalculate the encodings, e.g. athttps://github.com/bmreiniger/category_encoders/blob/a810a4b7abfce9fc4eb7fc401e3d37f2c1c6e402/category_encoders/target_encoder.py#L172
(the
groupby
matches up by index).This is the cause (or at least one of the causes) of #272.
Actual Behavior
outputs
More nefarious problems occur when the indexes partially match up so that the returned values aren't
NaN
but are incorrect.Specifications
The text was updated successfully, but these errors were encountered: