Skip to content

Commit 30288dc

Browse files
committed
ENH: add calamine excel reader (close #50395)
Co-author: Kostya Farber (#50581)
1 parent da849a9 commit 30288dc

20 files changed

+238
-53
lines changed

ci/deps/actions-310.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,4 +58,5 @@ dependencies:
5858

5959
- pip:
6060
- pyqt5>=5.15.6
61+
- python-calamine>=0.1.6
6162
- tzdata>=2022.1

ci/deps/actions-311-downstream_compat.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,4 +73,5 @@ dependencies:
7373
- pip:
7474
- dataframe-api-compat>=0.1.7
7575
- pyqt5>=5.15.6
76+
- python-calamine>=0.1.6
7677
- tzdata>=2022.1

ci/deps/actions-311.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,4 +58,5 @@ dependencies:
5858

5959
- pip:
6060
- pyqt5>=5.15.6
61+
- python-calamine>=0.1.6
6162
- tzdata>=2022.1

ci/deps/actions-39-minimum_versions.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,4 +61,5 @@ dependencies:
6161
- pip:
6262
- dataframe-api-compat==0.1.7
6363
- pyqt5==5.15.6
64+
- python-calamine==0.1.6
6465
- tzdata==2022.1

ci/deps/actions-39.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,4 +58,5 @@ dependencies:
5858

5959
- pip:
6060
- pyqt5>=5.15.6
61+
- python-calamine>=0.1.6
6162
- tzdata>=2022.1

ci/deps/circle-310-arm64.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,3 +56,6 @@ dependencies:
5656
- xlrd>=2.0.1
5757
- xlsxwriter>=3.0.3
5858
- zstandard>=0.17.0
59+
60+
- pip:
61+
- python-calamine>=0.1.6

doc/source/getting_started/install.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,7 @@ xlrd 2.0.1 excel Reading Excel
281281
xlsxwriter 3.0.3 excel Writing Excel
282282
openpyxl 3.0.10 excel Reading / writing for xlsx files
283283
pyxlsb 1.0.9 excel Reading for xlsb files
284+
python-calamine 0.1.6 excel Reading for xls/xlsx/xlsb/ods files
284285
========================= ================== =============== =============================================================
285286

286287
HTML

doc/source/user_guide/io.rst

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3453,7 +3453,8 @@ Excel files
34533453
The :func:`~pandas.read_excel` method can read Excel 2007+ (``.xlsx``) files
34543454
using the ``openpyxl`` Python module. Excel 2003 (``.xls``) files
34553455
can be read using ``xlrd``. Binary Excel (``.xlsb``)
3456-
files can be read using ``pyxlsb``.
3456+
files can be read using ``pyxlsb``. Also, all this formats can be read
3457+
using :ref:`calamine<io.calamine>` engine.
34573458
The :meth:`~DataFrame.to_excel` instance method is used for
34583459
saving a ``DataFrame`` to Excel. Generally the semantics are
34593460
similar to working with :ref:`csv<io.read_csv_table>` data.
@@ -3494,6 +3495,9 @@ using internally.
34943495

34953496
* For the engine odf, pandas is using :func:`odf.opendocument.load` to read in (``.ods``) files.
34963497

3498+
* For the engine calamine, pandas is using :func:`python_calamine.load_workbook`
3499+
to read in (``.xlsx``), (``.xlsm``), (``.xls``), (``.xlsb``), (``.ods``) files.
3500+
34973501
.. code-block:: python
34983502
34993503
# Returns a DataFrame
@@ -3935,7 +3939,8 @@ The :func:`~pandas.read_excel` method can also read binary Excel files
39353939
using the ``pyxlsb`` module. The semantics and features for reading
39363940
binary Excel files mostly match what can be done for `Excel files`_ using
39373941
``engine='pyxlsb'``. ``pyxlsb`` does not recognize datetime types
3938-
in files and will return floats instead.
3942+
in files and will return floats instead (you can use :ref:`calamine<io.calamine>`
3943+
if you need recognize datetime types).
39393944

39403945
.. code-block:: python
39413946
@@ -3947,6 +3952,22 @@ in files and will return floats instead.
39473952
Currently pandas only supports *reading* binary Excel files. Writing
39483953
is not implemented.
39493954

3955+
.. _io.calamine:
3956+
3957+
Calamine (Excel and ODS files)
3958+
------------------------------
3959+
3960+
The :func:`~pandas.read_excel` method can read Excel file (``.xlsx``, ``.xlsm``, ``.xls``, ``.xlsb``)
3961+
and OpenDocument spreadsheets (``.ods``) using the ``python-calamine`` module.
3962+
This module is a binding for Rust library `calamine <https://crates.io/crates/calamine>`__
3963+
and faster then other engines in most cases. The semantics and features for reading files
3964+
match what can be done for `Excel files`_ using ``engine='calamine'``.
3965+
The optional dependency 'python-calamine' needs to be installed.
3966+
3967+
.. code-block:: python
3968+
3969+
# Returns a DataFrame
3970+
pd.read_excel("path_to_file.xlsb", engine="calamine")
39503971
39513972
.. _io.clipboard:
39523973

doc/source/whatsnew/v2.2.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ enhancement2
2828

2929
Other enhancements
3030
^^^^^^^^^^^^^^^^^^
31-
-
31+
- Added ``calamine`` as an engine to ``read_excel`` (:issue:`50395`)
3232
-
3333

3434
.. ---------------------------------------------------------------------------

environment.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,5 +115,6 @@ dependencies:
115115
- pip:
116116
- dataframe-api-compat>=0.1.7
117117
- sphinx-toggleprompt # conda-forge version has stricter pins on jinja2
118+
- python-calamine>=0.1.6
118119
- typing_extensions; python_version<"3.11"
119120
- tzdata>=2022.1

0 commit comments

Comments
 (0)