Skip to content

DOC: Improve reshape\concat #47061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
May 27, 2022
Merged

DOC: Improve reshape\concat #47061

merged 20 commits into from
May 27, 2022

Conversation

anetakahle
Copy link
Contributor

@pep8speaks
Copy link

pep8speaks commented May 19, 2022

Hello @anetakahle! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-05-27 10:26:04 UTC

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @anetakahle , thanks for taking this on

Instead of this example, could you add an example of how to append a single row (e.g. here), which seems to be the most common source of confusion?

@anetakahle anetakahle force-pushed the doc-concat branch 2 times, most recently from 758a4aa to 9dfef3a Compare May 20, 2022 07:09
@anetakahle
Copy link
Contributor Author

@MarcoGorelli thanks for a quick review :)
I've updated the example, is the current version better?

Comment on lines 354 to 359
>>> b = pd.DataFrame({"A": 3}, index=[0])
>>> b
A
0 3
>>> for rowIndex, row in b.iterrows():
>>> print(pd.concat([a, row.to_frame().T], ignore_index=True))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it work to make this

new_row = pd.Series([3])

and then just do

pd.concat([a, new_row.to_frame().T], ignore_index=True)

?
Iterating over rows is what we want to avoid

Also, let's rename a to df7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as requested in 90bafc6

Is it ok that the data from the new_row have appeared in a new row and not in the first one as in the previous solution?

I've also read the discussion about the deprecated append and maybe we should add some note into the documentation as well, something like

It is not recomended to build DataFrames by adding single rows in a
not loop. Build a list of rows and make a DataFrame in a single concat.

Co-Authored-By: Matěj Štágl <[email protected]>
@MarcoGorelli
Copy link
Member

I've also read the discussion about the deprecated append and maybe we should add some note into the documentation as well, something like

Yes, good suggestion, thanks!

anetakahle and others added 2 commits May 20, 2022 19:14
Co-Authored-By: Matěj Štágl <[email protected]>
Co-Authored-By: Matěj Štágl <[email protected]>
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting there 💪

Co-Authored-By: Matěj Štágl <[email protected]>
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of unrelated changes, this is showing 15 files changed - could you rebase onto main and only modify what's necessary?

Also, please run pre-commit on the file(s) you've changed (see https://pandas.pydata.org/docs/development/contributing_codebase.html#pre-commit)

@anetakahle
Copy link
Contributor Author

anetakahle commented May 20, 2022

@MarcoGorelli fixed, shows only 1 file changed now as intended

@MarcoGorelli MarcoGorelli self-requested a review May 20, 2022 20:45
@anetakahle
Copy link
Contributor Author

@MarcoGorelli I also ran the pre-commit locally and it didn't fail.

But there are still some other fails here on GitHub, but some of them are on the document as a whole, not just my changes (for example in the case of Code Checks / Docstring and typing validation (pull_request) the test shows to some errors which are not in the code I posted)

@MarcoGorelli
Copy link
Member

are you sure you committed your latest changes? can you show the output of:

pre-commit run --files pandas/core/reshape/concat.py
git remote -v
git fetch origin
git diff origin/doc-concat

please?

@anetakahle
Copy link
Contributor Author

@MarcoGorelli
Sure!

Here is the output:
pre-commit run --files pandas/core/reshape/concat.py
git remote -v
git fetch origin - no output
git diff origin/doc-concat

@MarcoGorelli
Copy link
Member

MarcoGorelli commented May 21, 2022

OK looks like you have lots of local unstaged changes (perhaps from a merge gone not quite right?)

I think the simplest way out would be:

git checkout -- .  # discard local changes
git reset --hard origin/doc-concat
pre-commit run --files pandas/core/reshape/concat.py
git commit -m 'lint file'
git push origin doc-concat

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 no worries - looks like you've added an extra level on identation to the docstring though, could you restore the original level of indentation please?

@anetakahle
Copy link
Contributor Author

@MarcoGorelli
The extra level on indentation was added by the pre-commit.
But I think I fixed it. 😄

@jreback jreback added the Docs label May 21, 2022
@anetakahle
Copy link
Contributor Author

@MarcoGorelli
Resolved. :)

@anetakahle
Copy link
Contributor Author

@MarcoGorelli
OMG, the pre-commit has done so much mess in there :D
I fixed it, I hope it's all ok and I can squash now? :)

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, but there's still a doctest failure:

_________________ [doctest] pandas.core.reshape.concat.concat __________________
350     ValueError: Indexes have overlapping values: ['a']
351 
352     Append a single row to the end of a ``DataFrame`` object.
353 
354     >>> df7 = pd.DataFrame({'a': 1, 'b': 2}, index=[0])
355     >>> df7
356         a   b
357     0   1   2
358     >>> new_row = pd.Series({'a': 3, 'b': 4})
359     >>> new_row
Expected:
        a   3
        b   4
Got:
    a    3
    b    4
    dtype: int64

(and no need to squash the commits)

@anetakahle
Copy link
Contributor Author

anetakahle commented May 22, 2022

@MarcoGorelli
Ok, I think I fixed it,
but the Code Checks / Docstring and typing validation test is still failing and I don't know why :/ :D
Maybe because of some code I didn't add?

FAILED pandas/core/tools/datetimes.py::pandas.core.tools.datetimes.to_datetime
...
=================================== FAILURES ===================================
______________ [doctest] pandas.core.tools.datetimes.to_datetime _______________
1013 
1014     >>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'],
1015     ...                utc=True)
1016     DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'],
1017                   dtype='datetime64[ns, UTC]', freq=None)
1018 
1019     - Inputs can contain both naive and aware, string or datetime, the above
1020       rules still apply
1021 
1022     >>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 12:00 -0530',
UNEXPECTED EXCEPTION: NameError("name 'timezone' is not defined")
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pandas-dev/lib/python3.8/doctest.py", line [133](https://github.com/pandas-dev/pandas/runs/6543790014?check_suite_focus=true#step:10:134)6, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest pandas.core.tools.datetimes.to_datetime[18]>", line 4, in <module>
NameError: name 'timezone' is not defined
/home/runner/work/pandas/pandas/pandas/core/tools/datetimes.py:1022: UnexpectedException

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

The other failure looks unrelated, probably needs to be fixed on main first - I'll rebase later to make sure

@MarcoGorelli MarcoGorelli added this to the 1.5 milestone May 22, 2022
@anetakahle
Copy link
Contributor Author

@MarcoGorelli
Thank you😁💜☺️

@michalkahle
Copy link

Congratulations! 😉

@anetakahle anetakahle requested a review from MarcoGorelli May 24, 2022 14:15
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, looks like there's still a related failure:

Error: /home/runner/work/pandas/pandas/pandas/core/reshape/concat.py:146:GL03:pandas.concat:Double line break found; please use only one blank line to separate sections or paragraphs, and do not leave blank lines at the end of docstrings

@anetakahle
Copy link
Contributor Author

@MarcoGorelli
Oh, I didn't notice.
I changed it.

Thank you :)

@jreback jreback merged commit 17c6e2b into pandas-dev:main May 27, 2022
@jreback
Copy link
Contributor

jreback commented May 27, 2022

thanks @anetakahle

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: write guide for how to replace append
5 participants