Skip to content

Commit 13146e1

Browse files
committed
ENH: add calamine excel reader (close #50395)
Co-author: Kostya Farber (#50581)
1 parent faeedad commit 13146e1

20 files changed

+232
-53
lines changed

ci/deps/actions-310.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ dependencies:
4646
- pymysql>=1.0.2
4747
- pyreadstat>=1.1.5
4848
- pytables>=3.7.0
49+
- python-calamine>=0.1.6
4950
- pyxlsb>=1.0.9
5051
- s3fs>=2022.05.0
5152
- scipy>=1.8.1

ci/deps/actions-311-downstream_compat.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ dependencies:
4747
- pymysql>=1.0.2
4848
- pyreadstat>=1.1.5
4949
- pytables>=3.7.0
50+
- python-calamine>=0.1.6
5051
- pyxlsb>=1.0.9
5152
- s3fs>=2022.05.0
5253
- scipy>=1.8.1

ci/deps/actions-311.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ dependencies:
4646
- pymysql>=1.0.2
4747
- pyreadstat>=1.1.5
4848
# - pytables>=3.7.0, 3.8.0 is first version that supports 3.11
49+
- python-calamine>=0.1.6
4950
- pyxlsb>=1.0.9
5051
- s3fs>=2022.05.0
5152
- scipy>=1.8.1

ci/deps/actions-39-minimum_versions.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ dependencies:
4848
- pymysql=1.0.2
4949
- pyreadstat=1.1.5
5050
- pytables=3.7.0
51+
- python-calamine=0.1.6
5152
- pyxlsb=1.0.9
5253
- s3fs=2022.05.0
5354
- scipy=1.8.1

ci/deps/actions-39.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ dependencies:
4646
- pymysql>=1.0.2
4747
- pyreadstat>=1.1.5
4848
- pytables>=3.7.0
49+
- python-calamine>=0.1.6
4950
- pyxlsb>=1.0.9
5051
- s3fs>=2022.05.0
5152
- scipy>=1.8.1

ci/deps/circle-310-arm64.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ dependencies:
4747
- pymysql>=1.0.2
4848
# - pyreadstat>=1.1.5 not available on ARM
4949
- pytables>=3.7.0
50+
- python-calamine>=0.1.6
5051
- pyxlsb>=1.0.9
5152
- s3fs>=2022.05.0
5253
- scipy>=1.8.1

doc/source/getting_started/install.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,7 @@ xlrd 2.0.1 excel Reading Excel
281281
xlsxwriter 3.0.3 excel Writing Excel
282282
openpyxl 3.0.10 excel Reading / writing for xlsx files
283283
pyxlsb 1.0.9 excel Reading for xlsb files
284+
python-calamine 0.1.6 excel Reading for xls/xlsx/xlsb/ods files
284285
========================= ================== =============== =============================================================
285286

286287
HTML

doc/source/user_guide/io.rst

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3453,7 +3453,8 @@ Excel files
34533453
The :func:`~pandas.read_excel` method can read Excel 2007+ (``.xlsx``) files
34543454
using the ``openpyxl`` Python module. Excel 2003 (``.xls``) files
34553455
can be read using ``xlrd``. Binary Excel (``.xlsb``)
3456-
files can be read using ``pyxlsb``.
3456+
files can be read using ``pyxlsb``. All formats can be read
3457+
using :ref:`calamine<io.calamine>` engine.
34573458
The :meth:`~DataFrame.to_excel` instance method is used for
34583459
saving a ``DataFrame`` to Excel. Generally the semantics are
34593460
similar to working with :ref:`csv<io.read_csv_table>` data.
@@ -3494,6 +3495,9 @@ using internally.
34943495

34953496
* For the engine odf, pandas is using :func:`odf.opendocument.load` to read in (``.ods``) files.
34963497

3498+
* For the engine calamine, pandas is using :func:`python_calamine.load_workbook`
3499+
to read in (``.xlsx``), (``.xlsm``), (``.xls``), (``.xlsb``), (``.ods``) files.
3500+
34973501
.. code-block:: python
34983502
34993503
# Returns a DataFrame
@@ -3935,7 +3939,8 @@ The :func:`~pandas.read_excel` method can also read binary Excel files
39353939
using the ``pyxlsb`` module. The semantics and features for reading
39363940
binary Excel files mostly match what can be done for `Excel files`_ using
39373941
``engine='pyxlsb'``. ``pyxlsb`` does not recognize datetime types
3938-
in files and will return floats instead.
3942+
in files and will return floats instead (you can use :ref:`calamine<io.calamine>`
3943+
if you need recognize datetime types).
39393944

39403945
.. code-block:: python
39413946
@@ -3947,6 +3952,22 @@ in files and will return floats instead.
39473952
Currently pandas only supports *reading* binary Excel files. Writing
39483953
is not implemented.
39493954

3955+
.. _io.calamine:
3956+
3957+
Calamine (Excel and ODS files)
3958+
------------------------------
3959+
3960+
The :func:`~pandas.read_excel` method can read Excel file (``.xlsx``, ``.xlsm``, ``.xls``, ``.xlsb``)
3961+
and OpenDocument spreadsheets (``.ods``) using the ``python-calamine`` module.
3962+
This module is a binding for Rust library `calamine <https://crates.io/crates/calamine>`__
3963+
and faster then other engines in most cases. The semantics and features for reading files
3964+
match what can be done for `Excel files`_ using ``engine='calamine'``.
3965+
The optional dependency 'python-calamine' needs to be installed.
3966+
3967+
.. code-block:: python
3968+
3969+
# Returns a DataFrame
3970+
pd.read_excel("path_to_file.xlsb", engine="calamine")
39503971
39513972
.. _io.clipboard:
39523973

doc/source/whatsnew/v2.2.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ enhancement2
2828

2929
Other enhancements
3030
^^^^^^^^^^^^^^^^^^
31-
-
31+
- Added ``calamine`` as an engine to ``read_excel`` (:issue:`50395`)
3232
-
3333

3434
.. ---------------------------------------------------------------------------

environment.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ dependencies:
4747
- pymysql>=1.0.2
4848
- pyreadstat>=1.1.5
4949
- pytables>=3.7.0
50+
- python-calamine>=0.1.6
5051
- pyxlsb>=1.0.9
5152
- s3fs>=2022.05.0
5253
- scipy>=1.8.1

0 commit comments

Comments
 (0)