-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: Improve reshape\concat #47061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Improve reshape\concat #47061
Conversation
anetakahle
commented
May 19, 2022
- closes DOC: write guide for how to replace append #46825
- All code checks passed.
Hello @anetakahle! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2022-05-27 10:26:04 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @anetakahle , thanks for taking this on
Instead of this example, could you add an example of how to append a single row (e.g. here), which seems to be the most common source of confusion?
758a4aa
to
9dfef3a
Compare
@MarcoGorelli thanks for a quick review :) |
pandas/core/reshape/concat.py
Outdated
>>> b = pd.DataFrame({"A": 3}, index=[0]) | ||
>>> b | ||
A | ||
0 3 | ||
>>> for rowIndex, row in b.iterrows(): | ||
>>> print(pd.concat([a, row.to_frame().T], ignore_index=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it work to make this
new_row = pd.Series([3])
and then just do
pd.concat([a, new_row.to_frame().T], ignore_index=True)
?
Iterating over rows is what we want to avoid
Also, let's rename a
to df7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as requested in 90bafc6
Is it ok that the data from the new_row have appeared in a new row and not in the first one as in the previous solution?
I've also read the discussion about the deprecated append and maybe we should add some note into the documentation as well, something like
It is not recomended to build DataFrames by adding single rows in a
not loop. Build a list of rows and make a DataFrame in a single concat.
Co-Authored-By: Matěj Štágl <[email protected]>
Co-Authored-By: Matěj Štágl <[email protected]>
Yes, good suggestion, thanks! |
Co-Authored-By: Matěj Štágl <[email protected]>
Co-Authored-By: Matěj Štágl <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getting there 💪
Co-Authored-By: Matěj Štágl <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of unrelated changes, this is showing 15 files changed - could you rebase onto main
and only modify what's necessary?
Also, please run pre-commit
on the file(s) you've changed (see https://pandas.pydata.org/docs/development/contributing_codebase.html#pre-commit)
Co-authored-by: Marco Edward Gorelli <[email protected]>
This reverts commit f4e394d.
@MarcoGorelli fixed, shows only 1 file changed now as intended |
@MarcoGorelli I also ran the pre-commit locally and it didn't fail. But there are still some other fails here on GitHub, but some of them are on the document as a whole, not just my changes (for example in the case of |
are you sure you committed your latest changes? can you show the output of:
please? |
@MarcoGorelli Here is the output: |
OK looks like you have lots of local unstaged changes (perhaps from a merge gone not quite right?) I think the simplest way out would be:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😄 no worries - looks like you've added an extra level on identation to the docstring though, could you restore the original level of indentation please?
@MarcoGorelli |
@MarcoGorelli |
@MarcoGorelli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good, but there's still a doctest failure:
_________________ [doctest] pandas.core.reshape.concat.concat __________________
350 ValueError: Indexes have overlapping values: ['a']
351
352 Append a single row to the end of a ``DataFrame`` object.
353
354 >>> df7 = pd.DataFrame({'a': 1, 'b': 2}, index=[0])
355 >>> df7
356 a b
357 0 1 2
358 >>> new_row = pd.Series({'a': 3, 'b': 4})
359 >>> new_row
Expected:
a 3
b 4
Got:
a 3
b 4
dtype: int64
(and no need to squash the commits)
@MarcoGorelli
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli |
Congratulations! 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, looks like there's still a related failure:
Error: /home/runner/work/pandas/pandas/pandas/core/reshape/concat.py:146:GL03:pandas.concat:Double line break found; please use only one blank line to separate sections or paragraphs, and do not leave blank lines at the end of docstrings
@MarcoGorelli Thank you :) |
thanks @anetakahle |