Skip to content

Commit 577b329

Browse files
Merge remote-tracking branch 'upstream/main' into regr-concat-empty-2
2 parents 5ed7dad + 89578fe commit 577b329

File tree

101 files changed

+1756
-484
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

101 files changed

+1756
-484
lines changed

.github/actions/build_pandas/action.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,6 @@ runs:
1717
shell: bash -el {0}
1818
env:
1919
# Cannot use parallel compilation on Windows, see https://github.com/pandas-dev/pandas/issues/30873
20-
N_JOBS: ${{ runner.os == 'Windows' && 1 || 2 }}
20+
# GH 47305: Parallel build causes flaky ImportError: /home/runner/work/pandas/pandas/pandas/_libs/tslibs/timestamps.cpython-38-x86_64-linux-gnu.so: undefined symbol: pandas_datetime_to_datetimestruct
21+
N_JOBS: 1
22+
#N_JOBS: ${{ runner.os == 'Windows' && 1 || 2 }}

.github/actions/run-tests/action.yml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: Run tests and report results
2+
runs:
3+
using: composite
4+
steps:
5+
- name: Test
6+
run: ci/run_tests.sh
7+
shell: bash -el {0}
8+
9+
- name: Publish test results
10+
uses: actions/upload-artifact@v2
11+
with:
12+
name: Test results
13+
path: test-data.xml
14+
if: failure()
15+
16+
- name: Report Coverage
17+
run: coverage report -m
18+
shell: bash -el {0}
19+
if: failure()
20+
21+
- name: Upload coverage to Codecov
22+
uses: codecov/codecov-action@v2
23+
with:
24+
flags: unittests
25+
name: codecov-pandas
26+
fail_ci_if_error: false
27+
if: failure()

.github/workflows/macos-windows.yml

Lines changed: 1 addition & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -53,18 +53,4 @@ jobs:
5353
uses: ./.github/actions/build_pandas
5454

5555
- name: Test
56-
run: ci/run_tests.sh
57-
58-
- name: Publish test results
59-
uses: actions/upload-artifact@v3
60-
with:
61-
name: Test results
62-
path: test-data.xml
63-
if: failure()
64-
65-
- name: Upload coverage to Codecov
66-
uses: codecov/codecov-action@v2
67-
with:
68-
flags: unittests
69-
name: codecov-pandas
70-
fail_ci_if_error: false
56+
uses: ./.github/actions/run-tests

.github/workflows/posix.yml

Lines changed: 1 addition & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -157,23 +157,6 @@ jobs:
157157
uses: ./.github/actions/build_pandas
158158

159159
- name: Test
160-
run: ci/run_tests.sh
160+
uses: ./.github/actions/run-tests
161161
# TODO: Don't continue on error for PyPy
162162
continue-on-error: ${{ env.IS_PYPY == 'true' }}
163-
164-
- name: Build Version
165-
run: conda list
166-
167-
- name: Publish test results
168-
uses: actions/upload-artifact@v3
169-
with:
170-
name: Test results
171-
path: test-data.xml
172-
if: failure()
173-
174-
- name: Upload coverage to Codecov
175-
uses: codecov/codecov-action@v2
176-
with:
177-
flags: unittests
178-
name: codecov-pandas
179-
fail_ci_if_error: false

.github/workflows/python-dev.yml

Lines changed: 10 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -57,40 +57,20 @@ jobs:
5757
- name: Install dependencies
5858
shell: bash -el {0}
5959
run: |
60-
python -m pip install --upgrade pip setuptools wheel
61-
pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
62-
pip install git+https://github.com/nedbat/coveragepy.git
63-
pip install cython python-dateutil pytz hypothesis pytest>=6.2.5 pytest-xdist pytest-cov
64-
pip list
60+
python3 -m pip install --upgrade pip setuptools wheel
61+
python3 -m pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
62+
python3 -m pip install git+https://github.com/nedbat/coveragepy.git
63+
python3 -m pip install cython python-dateutil pytz hypothesis pytest>=6.2.5 pytest-xdist pytest-cov pytest-asyncio>=0.17
64+
python3 -m pip list
6565
6666
- name: Build Pandas
6767
run: |
68-
python setup.py build_ext -q -j2
69-
python -m pip install -e . --no-build-isolation --no-use-pep517
68+
python3 setup.py build_ext -q -j2
69+
python3 -m pip install -e . --no-build-isolation --no-use-pep517
7070
7171
- name: Build Version
7272
run: |
73-
python -c "import pandas; pandas.show_versions();"
73+
python3 -c "import pandas; pandas.show_versions();"
7474
75-
- name: Test with pytest
76-
shell: bash -el {0}
77-
run: |
78-
ci/run_tests.sh
79-
80-
- name: Publish test results
81-
uses: actions/upload-artifact@v3
82-
with:
83-
name: Test results
84-
path: test-data.xml
85-
if: failure()
86-
87-
- name: Report Coverage
88-
run: |
89-
coverage report -m
90-
91-
- name: Upload coverage to Codecov
92-
uses: codecov/codecov-action@v2
93-
with:
94-
flags: unittests
95-
name: codecov-pandas
96-
fail_ci_if_error: true
75+
- name: Test
76+
uses: ./.github/actions/run-tests

doc/source/reference/frame.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -373,6 +373,7 @@ Serialization / IO / conversion
373373

374374
DataFrame.from_dict
375375
DataFrame.from_records
376+
DataFrame.to_orc
376377
DataFrame.to_parquet
377378
DataFrame.to_pickle
378379
DataFrame.to_csv

doc/source/reference/io.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,7 @@ ORC
159159
:toctree: api/
160160

161161
read_orc
162+
DataFrame.to_orc
162163

163164
SAS
164165
~~~

doc/source/reference/testing.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Exceptions and warnings
3030
errors.DtypeWarning
3131
errors.DuplicateLabelError
3232
errors.EmptyDataError
33+
errors.IndexingError
3334
errors.InvalidIndexError
3435
errors.IntCastingNaNError
3536
errors.MergeError
@@ -45,6 +46,7 @@ Exceptions and warnings
4546
errors.SettingWithCopyError
4647
errors.SettingWithCopyWarning
4748
errors.SpecificationError
49+
errors.UndefinedVariableError
4850
errors.UnsortedIndexError
4951
errors.UnsupportedFunctionCall
5052

doc/source/user_guide/io.rst

Lines changed: 55 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
3030
binary;`HDF5 Format <https://support.hdfgroup.org/HDF5/whatishdf5.html>`__;:ref:`read_hdf<io.hdf5>`;:ref:`to_hdf<io.hdf5>`
3131
binary;`Feather Format <https://github.com/wesm/feather>`__;:ref:`read_feather<io.feather>`;:ref:`to_feather<io.feather>`
3232
binary;`Parquet Format <https://parquet.apache.org/>`__;:ref:`read_parquet<io.parquet>`;:ref:`to_parquet<io.parquet>`
33-
binary;`ORC Format <https://orc.apache.org/>`__;:ref:`read_orc<io.orc>`;
33+
binary;`ORC Format <https://orc.apache.org/>`__;:ref:`read_orc<io.orc>`;:ref:`to_orc<io.orc>`
3434
binary;`Stata <https://en.wikipedia.org/wiki/Stata>`__;:ref:`read_stata<io.stata_reader>`;:ref:`to_stata<io.stata_writer>`
3535
binary;`SAS <https://en.wikipedia.org/wiki/SAS_(software)>`__;:ref:`read_sas<io.sas_reader>`;
3636
binary;`SPSS <https://en.wikipedia.org/wiki/SPSS>`__;:ref:`read_spss<io.spss_reader>`;
@@ -5562,13 +5562,64 @@ ORC
55625562
.. versionadded:: 1.0.0
55635563

55645564
Similar to the :ref:`parquet <io.parquet>` format, the `ORC Format <https://orc.apache.org/>`__ is a binary columnar serialization
5565-
for data frames. It is designed to make reading data frames efficient. pandas provides *only* a reader for the
5566-
ORC format, :func:`~pandas.read_orc`. This requires the `pyarrow <https://arrow.apache.org/docs/python/>`__ library.
5565+
for data frames. It is designed to make reading data frames efficient. pandas provides both the reader and the writer for the
5566+
ORC format, :func:`~pandas.read_orc` and :func:`~pandas.DataFrame.to_orc`. This requires the `pyarrow <https://arrow.apache.org/docs/python/>`__ library.
55675567

55685568
.. warning::
55695569

55705570
* It is *highly recommended* to install pyarrow using conda due to some issues occurred by pyarrow.
5571-
* :func:`~pandas.read_orc` is not supported on Windows yet, you can find valid environments on :ref:`install optional dependencies <install.warn_orc>`.
5571+
* :func:`~pandas.DataFrame.to_orc` requires pyarrow>=7.0.0.
5572+
* :func:`~pandas.read_orc` and :func:`~pandas.DataFrame.to_orc` are not supported on Windows yet, you can find valid environments on :ref:`install optional dependencies <install.warn_orc>`.
5573+
* For supported dtypes please refer to `supported ORC features in Arrow <https://arrow.apache.org/docs/cpp/orc.html#data-types>`__.
5574+
* Currently timezones in datetime columns are not preserved when a dataframe is converted into ORC files.
5575+
5576+
.. ipython:: python
5577+
5578+
df = pd.DataFrame(
5579+
{
5580+
"a": list("abc"),
5581+
"b": list(range(1, 4)),
5582+
"c": np.arange(4.0, 7.0, dtype="float64"),
5583+
"d": [True, False, True],
5584+
"e": pd.date_range("20130101", periods=3),
5585+
}
5586+
)
5587+
5588+
df
5589+
df.dtypes
5590+
5591+
Write to an orc file.
5592+
5593+
.. ipython:: python
5594+
:okwarning:
5595+
5596+
df.to_orc("example_pa.orc", engine="pyarrow")
5597+
5598+
Read from an orc file.
5599+
5600+
.. ipython:: python
5601+
:okwarning:
5602+
5603+
result = pd.read_orc("example_pa.orc")
5604+
5605+
result.dtypes
5606+
5607+
Read only certain columns of an orc file.
5608+
5609+
.. ipython:: python
5610+
5611+
result = pd.read_orc(
5612+
"example_pa.orc",
5613+
columns=["a", "b"],
5614+
)
5615+
result.dtypes
5616+
5617+
5618+
.. ipython:: python
5619+
:suppress:
5620+
5621+
os.remove("example_pa.orc")
5622+
55725623
55735624
.. _io.sql:
55745625

doc/source/whatsnew/v1.4.3.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,19 @@ including other versions of pandas.
1515
Fixed regressions
1616
~~~~~~~~~~~~~~~~~
1717
- Fixed regression in :meth:`DataFrame.replace` when the replacement value was explicitly ``None`` when passed in a dictionary to ``to_replace`` also casting other columns to object dtype even when there were no values to replace (:issue:`46634`)
18+
- Fixed regression in :meth:`DataFrame.to_csv` raising error when :class:`DataFrame` contains extension dtype categorical column (:issue:`46297`, :issue:`46812`)
19+
- Fixed regression in representation of ``dtypes`` attribute of :class:`MultiIndex` (:issue:`46900`)
1820
- Fixed regression when setting values with :meth:`DataFrame.loc` updating :class:`RangeIndex` when index was set as new column and column was updated afterwards (:issue:`47128`)
1921
- Fixed regression in :meth:`DataFrame.nsmallest` led to wrong results when ``np.nan`` in the sorting column (:issue:`46589`)
2022
- Fixed regression in :func:`read_fwf` raising ``ValueError`` when ``widths`` was specified with ``usecols`` (:issue:`46580`)
2123
- Fixed regression in :func:`concat` not sorting columns for mixed column names (:issue:`47127`)
2224
- Fixed regression in :meth:`.Groupby.transform` and :meth:`.Groupby.agg` failing with ``engine="numba"`` when the index was a :class:`MultiIndex` (:issue:`46867`)
25+
- Fixed regression in ``NaN`` comparison for :class:`Index` operations where the same object was compared (:issue:`47105`)
2326
- Fixed regression is :meth:`.Styler.to_latex` and :meth:`.Styler.to_html` where ``buf`` failed in combination with ``encoding`` (:issue:`47053`)
2427
- Fixed regression in :func:`read_csv` with ``index_col=False`` identifying first row as index names when ``header=None`` (:issue:`46955`)
2528
- Fixed regression in :meth:`.DataFrameGroupBy.agg` when used with list-likes or dict-likes and ``axis=1`` that would give incorrect results; now raises ``NotImplementedError`` (:issue:`46995`)
2629
- Fixed regression in :meth:`DataFrame.resample` and :meth:`DataFrame.rolling` when used with list-likes or dict-likes and ``axis=1`` that would raise an unintuitive error message; now raises ``NotImplementedError`` (:issue:`46904`)
30+
- Fixed regression in :func:`assert_index_equal` when ``check_order=False`` and :class:`Index` has extension or object dtype (:issue:`47207`)
2731
- Fixed regression in :func:`read_excel` returning ints as floats on certain input sheets (:issue:`46988`)
2832
- Fixed regression in :meth:`DataFrame.shift` when ``axis`` is ``columns`` and ``fill_value`` is absent, ``freq`` is ignored (:issue:`47039`)
2933

0 commit comments

Comments
 (0)