Skip to content

Commit 1a90629

Browse files
authored
Merge branch 'master' into series_rolling_count_ignores_min_periods
2 parents bfe10f0 + bbcda98 commit 1a90629

File tree

105 files changed

+1920
-1310
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

105 files changed

+1920
-1310
lines changed

asv_bench/benchmarks/reshape.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,9 @@ def time_pivot_table_categorical_observed(self):
161161
observed=True,
162162
)
163163

164+
def time_pivot_table_margins_only_column(self):
165+
self.df.pivot_table(columns=["key2", "key3"], margins=True)
166+
164167

165168
class Crosstab:
166169
def setup(self):

ci/deps/azure-37-locale.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,6 @@ dependencies:
3434
- xlsxwriter
3535
- xlwt
3636
- pyarrow>=0.15
37+
- pip
38+
- pip:
39+
- pyxlsb

ci/deps/azure-macos-36.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,4 @@ dependencies:
3333
- pip
3434
- pip:
3535
- pyreadstat
36+
- pyxlsb

ci/deps/azure-windows-37.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,3 +35,6 @@ dependencies:
3535
- xlsxwriter
3636
- xlwt
3737
- pyreadstat
38+
- pip
39+
- pip:
40+
- pyxlsb

ci/deps/travis-36-cov.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,4 @@ dependencies:
5151
- coverage
5252
- pandas-datareader
5353
- python-dateutil
54+
- pyxlsb

ci/print_skipped.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/usr/bin/env python
1+
#!/usr/bin/env python3
22
import os
33
import xml.etree.ElementTree as et
44

doc/make.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/usr/bin/env python
1+
#!/usr/bin/env python3
22
"""
33
Python script for building documentation.
44

doc/source/getting_started/install.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,7 @@ pyarrow 0.12.0 Parquet, ORC (requires 0.13.0), and
264264
pymysql 0.7.11 MySQL engine for sqlalchemy
265265
pyreadstat SPSS files (.sav) reading
266266
pytables 3.4.2 HDF5 reading / writing
267+
pyxlsb 1.0.5 Reading for xlsb files
267268
qtpy Clipboard I/O
268269
s3fs 0.3.0 Amazon S3 access
269270
tabulate 0.8.3 Printing in Markdown-friendly format (see `tabulate`_)

doc/source/user_guide/io.rst

Lines changed: 70 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
2323
text;`JSON <https://www.json.org/>`__;:ref:`read_json<io.json_reader>`;:ref:`to_json<io.json_writer>`
2424
text;`HTML <https://en.wikipedia.org/wiki/HTML>`__;:ref:`read_html<io.read_html>`;:ref:`to_html<io.html>`
2525
text; Local clipboard;:ref:`read_clipboard<io.clipboard>`;:ref:`to_clipboard<io.clipboard>`
26-
binary;`MS Excel <https://en.wikipedia.org/wiki/Microsoft_Excel>`__;:ref:`read_excel<io.excel_reader>`;:ref:`to_excel<io.excel_writer>`
26+
;`MS Excel <https://en.wikipedia.org/wiki/Microsoft_Excel>`__;:ref:`read_excel<io.excel_reader>`;:ref:`to_excel<io.excel_writer>`
2727
binary;`OpenDocument <http://www.opendocumentformat.org>`__;:ref:`read_excel<io.ods>`;
2828
binary;`HDF5 Format <https://support.hdfgroup.org/HDF5/whatishdf5.html>`__;:ref:`read_hdf<io.hdf5>`;:ref:`to_hdf<io.hdf5>`
2929
binary;`Feather Format <https://github.com/wesm/feather>`__;:ref:`read_feather<io.feather>`;:ref:`to_feather<io.feather>`
@@ -2768,7 +2768,8 @@ Excel files
27682768

27692769
The :func:`~pandas.read_excel` method can read Excel 2003 (``.xls``)
27702770
files using the ``xlrd`` Python module. Excel 2007+ (``.xlsx``) files
2771-
can be read using either ``xlrd`` or ``openpyxl``.
2771+
can be read using either ``xlrd`` or ``openpyxl``. Binary Excel (``.xlsb``)
2772+
files can be read using ``pyxlsb``.
27722773
The :meth:`~DataFrame.to_excel` instance method is used for
27732774
saving a ``DataFrame`` to Excel. Generally the semantics are
27742775
similar to working with :ref:`csv<io.read_csv_table>` data.
@@ -3229,6 +3230,30 @@ OpenDocument spreadsheets match what can be done for `Excel files`_ using
32293230
Currently pandas only supports *reading* OpenDocument spreadsheets. Writing
32303231
is not implemented.
32313232

3233+
.. _io.xlsb:
3234+
3235+
Binary Excel (.xlsb) files
3236+
--------------------------
3237+
3238+
.. versionadded:: 1.0.0
3239+
3240+
The :func:`~pandas.read_excel` method can also read binary Excel files
3241+
using the ``pyxlsb`` module. The semantics and features for reading
3242+
binary Excel files mostly match what can be done for `Excel files`_ using
3243+
``engine='pyxlsb'``. ``pyxlsb`` does not recognize datetime types
3244+
in files and will return floats instead.
3245+
3246+
.. code-block:: python
3247+
3248+
# Returns a DataFrame
3249+
pd.read_excel('path_to_file.xlsb', engine='pyxlsb')
3250+
3251+
.. note::
3252+
3253+
Currently pandas only supports *reading* binary Excel files. Writing
3254+
is not implemented.
3255+
3256+
32323257
.. _io.clipboard:
32333258

32343259
Clipboard
@@ -4220,46 +4245,49 @@ Compression
42204245
all kinds of stores, not just tables. Two parameters are used to
42214246
control compression: ``complevel`` and ``complib``.
42224247

4223-
``complevel`` specifies if and how hard data is to be compressed.
4224-
``complevel=0`` and ``complevel=None`` disables
4225-
compression and ``0<complevel<10`` enables compression.
4226-
4227-
``complib`` specifies which compression library to use. If nothing is
4228-
specified the default library ``zlib`` is used. A
4229-
compression library usually optimizes for either good
4230-
compression rates or speed and the results will depend on
4231-
the type of data. Which type of
4232-
compression to choose depends on your specific needs and
4233-
data. The list of supported compression libraries:
4234-
4235-
- `zlib <https://zlib.net/>`_: The default compression library. A classic in terms of compression, achieves good compression rates but is somewhat slow.
4236-
- `lzo <https://www.oberhumer.com/opensource/lzo/>`_: Fast compression and decompression.
4237-
- `bzip2 <http://bzip.org/>`_: Good compression rates.
4238-
- `blosc <http://www.blosc.org/>`_: Fast compression and decompression.
4239-
4240-
Support for alternative blosc compressors:
4241-
4242-
- `blosc:blosclz <http://www.blosc.org/>`_ This is the
4243-
default compressor for ``blosc``
4244-
- `blosc:lz4
4245-
<https://fastcompression.blogspot.dk/p/lz4.html>`_:
4246-
A compact, very popular and fast compressor.
4247-
- `blosc:lz4hc
4248-
<https://fastcompression.blogspot.dk/p/lz4.html>`_:
4249-
A tweaked version of LZ4, produces better
4250-
compression ratios at the expense of speed.
4251-
- `blosc:snappy <https://google.github.io/snappy/>`_:
4252-
A popular compressor used in many places.
4253-
- `blosc:zlib <https://zlib.net/>`_: A classic;
4254-
somewhat slower than the previous ones, but
4255-
achieving better compression ratios.
4256-
- `blosc:zstd <https://facebook.github.io/zstd/>`_: An
4257-
extremely well balanced codec; it provides the best
4258-
compression ratios among the others above, and at
4259-
reasonably fast speed.
4260-
4261-
If ``complib`` is defined as something other than the
4262-
listed libraries a ``ValueError`` exception is issued.
4248+
* ``complevel`` specifies if and how hard data is to be compressed.
4249+
``complevel=0`` and ``complevel=None`` disables compression and
4250+
``0<complevel<10`` enables compression.
4251+
4252+
* ``complib`` specifies which compression library to use.
4253+
If nothing is specified the default library ``zlib`` is used. A
4254+
compression library usually optimizes for either good compression rates
4255+
or speed and the results will depend on the type of data. Which type of
4256+
compression to choose depends on your specific needs and data. The list
4257+
of supported compression libraries:
4258+
4259+
- `zlib <https://zlib.net/>`_: The default compression library.
4260+
A classic in terms of compression, achieves good compression
4261+
rates but is somewhat slow.
4262+
- `lzo <https://www.oberhumer.com/opensource/lzo/>`_: Fast
4263+
compression and decompression.
4264+
- `bzip2 <http://bzip.org/>`_: Good compression rates.
4265+
- `blosc <http://www.blosc.org/>`_: Fast compression and
4266+
decompression.
4267+
4268+
Support for alternative blosc compressors:
4269+
4270+
- `blosc:blosclz <http://www.blosc.org/>`_ This is the
4271+
default compressor for ``blosc``
4272+
- `blosc:lz4
4273+
<https://fastcompression.blogspot.dk/p/lz4.html>`_:
4274+
A compact, very popular and fast compressor.
4275+
- `blosc:lz4hc
4276+
<https://fastcompression.blogspot.dk/p/lz4.html>`_:
4277+
A tweaked version of LZ4, produces better
4278+
compression ratios at the expense of speed.
4279+
- `blosc:snappy <https://google.github.io/snappy/>`_:
4280+
A popular compressor used in many places.
4281+
- `blosc:zlib <https://zlib.net/>`_: A classic;
4282+
somewhat slower than the previous ones, but
4283+
achieving better compression ratios.
4284+
- `blosc:zstd <https://facebook.github.io/zstd/>`_: An
4285+
extremely well balanced codec; it provides the best
4286+
compression ratios among the others above, and at
4287+
reasonably fast speed.
4288+
4289+
If ``complib`` is defined as something other than the listed libraries a
4290+
``ValueError`` exception is issued.
42634291

42644292
.. note::
42654293

doc/source/whatsnew/v1.0.0.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -215,7 +215,8 @@ Other enhancements
215215
- :meth:`Styler.format` added the ``na_rep`` parameter to help format the missing values (:issue:`21527`, :issue:`28358`)
216216
- Roundtripping DataFrames with nullable integer, string and period data types to parquet
217217
(:meth:`~DataFrame.to_parquet` / :func:`read_parquet`) using the `'pyarrow'` engine
218-
now preserve those data types with pyarrow >= 0.16.0 (:issue:`20612`, :issue:`28371`).
218+
now preserve those data types with pyarrow >= 1.0.0 (:issue:`20612`).
219+
- :func:`read_excel` now can read binary Excel (``.xlsb``) files by passing ``engine='pyxlsb'``. For more details and example usage, see the :ref:`Binary Excel files documentation <io.xlsb>`. Closes :issue:`8540`.
219220
- The ``partition_cols`` argument in :meth:`DataFrame.to_parquet` now accepts a string (:issue:`27117`)
220221
- :func:`pandas.read_json` now parses ``NaN``, ``Infinity`` and ``-Infinity`` (:issue:`12213`)
221222
- :func:`to_parquet` now appropriately handles the ``schema`` argument for user defined schemas in the pyarrow engine. (:issue:`30270`)

0 commit comments

Comments
 (0)