-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: try to preserve the dtype on combine_first for the case where the two DataFrame objects have the same columns #39051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jreback
merged 13 commits into
pandas-dev:master
from
danielhrisca:keep_dtypes_on_combine_first
Jan 15, 2021
Merged
Changes from 3 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
0c1d126
ENH: add argument to preserve dtypes of common columns in combine_first
danielhrisca 1a5fe0f
fix black code style
danielhrisca 24f6ffc
fix misspelled word in docstring
danielhrisca d0f9ed3
update tests and remove preserve_dtypes argument from combine_first
danielhrisca 198eaa4
fix isort and flake8 errors
danielhrisca f209590
updates according to erview
danielhrisca 5e252d0
update whatsnew with example code
danielhrisca 7c67e3c
wrong header in documentation
danielhrisca 47d0911
fix black code style
danielhrisca 1b5691c
update whatsnew with example code as requested in the review
danielhrisca ba49f9c
remove redundant check
danielhrisca a2d4e38
further fix and polish the whatsnew entry
danielhrisca f937928
fix single letter variable names
danielhrisca File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,10 @@ def test_combine_first_mixed(self): | |
combined = f.combine_first(g) | ||
tm.assert_frame_equal(combined, exp) | ||
|
||
exp = DataFrame({"A": list("abab"), "B": [0, 1, 0, 1]}, index=[0, 1, 5, 6]) | ||
combined = f.combine_first(g, preserve_dtypes=True) | ||
tm.assert_frame_equal(combined, exp) | ||
|
||
def test_combine_first(self, float_frame): | ||
# disjoint | ||
head, tail = float_frame[:5], float_frame[5:] | ||
|
@@ -363,9 +367,16 @@ def test_combine_first_int(self): | |
expected_12 = DataFrame({"a": [0, 1, 3, 5]}, dtype="float64") | ||
tm.assert_frame_equal(result_12, expected_12) | ||
|
||
result_12 = df1.combine_first(df2, preserve_dtypes=True) | ||
expected_12 = DataFrame({"a": [0, 1, 3, 5]}) | ||
tm.assert_frame_equal(result_12, expected_12) | ||
|
||
result_21 = df2.combine_first(df1) | ||
expected_21 = DataFrame({"a": [1, 4, 3, 5]}, dtype="float64") | ||
tm.assert_frame_equal(result_21, expected_21) | ||
|
||
result_21 = df2.combine_first(df1, preserve_dtypes=True) | ||
expected_21 = DataFrame({"a": [1, 4, 3, 5]}) | ||
tm.assert_frame_equal(result_21, expected_21) | ||
|
||
@pytest.mark.parametrize("val", [1, 1.0]) | ||
|
@@ -439,3 +450,35 @@ def test_combine_first_with_nan_multiindex(): | |
index=mi_expected, | ||
) | ||
tm.assert_frame_equal(res, expected) | ||
|
||
|
||
def test_combine_preserve_dtypes(): | ||
a = Series(["a", "b"], index=range(2)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add the issue number as a comment |
||
b = Series(range(2), index=range(2)) | ||
f = DataFrame({"A": a, "B": b}) | ||
|
||
c = Series(["a", "b"], index=range(5, 7)) | ||
b = Series(range(-1, 1), index=range(5, 7)) | ||
g = DataFrame({"B": b, "C": c}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick: can you avoid 1-letter variable names? makes it harder to grep for things |
||
|
||
exp = DataFrame( | ||
{ | ||
"A": ["a", "b", np.nan, np.nan], | ||
"B": [0.0, 1.0, -1.0, 0.0], | ||
"C": [np.nan, np.nan, "a", "b"], | ||
}, | ||
index=[0, 1, 5, 6], | ||
) | ||
combined = f.combine_first(g) | ||
tm.assert_frame_equal(combined, exp) | ||
|
||
exp = DataFrame( | ||
{ | ||
"A": ["a", "b", np.nan, np.nan], | ||
"B": [0, 1, -1, 0], | ||
"C": [np.nan, np.nan, "a", "b"], | ||
}, | ||
index=[0, 1, 5, 6], | ||
) | ||
combined = f.combine_first(g, preserve_dtypes=True) | ||
tm.assert_frame_equal(combined, exp) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not want to add a flag for this. simply change it.
Please add some examples for this behaivor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that maybe it is a good idea to keep the current behavior as default, and provide the new behavior as an option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no its better to just fix this, you can add a whatsnew note in 1.3. everywhere else we cast to common dtypes, this should be no different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm running into some failed tests that exceed my understand of the lib. Is it expected that if a Series is constructed from a list of
None
then the result of this combined with some other Series should have the latter's dtype (coercing to the respectiveNaN
value)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list of None -> object, so combined -> object