From 114ef6a2f730559f1050c47802b25beaf393b330 Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Wed, 1 Feb 2023 11:01:02 +0000 Subject: [PATCH 1/6] DOC: Update docs to reflect that Index can hold int64, int32 etc. arrays --- doc/source/development/internals.rst | 3 - doc/source/user_guide/advanced.rst | 124 ++++----------------------- doc/source/user_guide/indexing.rst | 22 ++++- doc/source/user_guide/io.rst | 2 +- doc/source/user_guide/timedeltas.rst | 2 +- doc/source/whatsnew/v2.0.0.rst | 72 ++++++++++++++++ 6 files changed, 112 insertions(+), 113 deletions(-) diff --git a/doc/source/development/internals.rst b/doc/source/development/internals.rst index f9cff9634f3cb..3dd687ef2087d 100644 --- a/doc/source/development/internals.rst +++ b/doc/source/development/internals.rst @@ -19,9 +19,6 @@ containers for the axis labels: assuming nothing about its contents. The labels must be hashable (and likely immutable) and unique. Populates a dict of label to location in Cython to do ``O(1)`` lookups. -* ``Int64Index``: a version of ``Index`` highly optimized for 64-bit integer - data, such as time stamps -* ``Float64Index``: a version of ``Index`` highly optimized for 64-bit float data * :class:`MultiIndex`: the standard hierarchical index object * :class:`DatetimeIndex`: An Index object with :class:`Timestamp` boxed elements (impl are the int64 values) * :class:`TimedeltaIndex`: An Index object with :class:`Timedelta` boxed elements (impl are the in64 values) diff --git a/doc/source/user_guide/advanced.rst b/doc/source/user_guide/advanced.rst index b8df21ab5a5b4..24c2f0b74a12f 100644 --- a/doc/source/user_guide/advanced.rst +++ b/doc/source/user_guide/advanced.rst @@ -848,125 +848,35 @@ values **not** in the categories, similarly to how you can reindex **any** panda .. _advanced.rangeindex: -Int64Index and RangeIndex -~~~~~~~~~~~~~~~~~~~~~~~~~ +RangeIndex +~~~~~~~~~~ -.. deprecated:: 1.4.0 - In pandas 2.0, :class:`Index` will become the default index type for numeric types - instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types - are therefore deprecated and will be removed in a futire version. - ``RangeIndex`` will not be removed, as it represents an optimized version of an integer index. - -:class:`Int64Index` is a fundamental basic index in pandas. This is an immutable array -implementing an ordered, sliceable set. - -:class:`RangeIndex` is a sub-class of ``Int64Index`` that provides the default index for all ``NDFrame`` objects. -``RangeIndex`` is an optimized version of ``Int64Index`` that can represent a monotonic ordered set. These are analogous to Python `range types `__. - -.. _advanced.float64index: - -Float64Index -~~~~~~~~~~~~ - -.. deprecated:: 1.4.0 - :class:`Index` will become the default index type for numeric types in the future - instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types - are therefore deprecated and will be removed in a future version of Pandas. - ``RangeIndex`` will not be removed as it represents an optimized version of an integer index. - -By default a :class:`Float64Index` will be automatically created when passing floating, or mixed-integer-floating values in index creation. -This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the -same. - -.. ipython:: python - - indexf = pd.Index([1.5, 2, 3, 4.5, 5]) - indexf - sf = pd.Series(range(5), index=indexf) - sf - -Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``). +:class:`RangeIndex` is a sub-class of :class:`Index` that provides the default index for all :class:`DataFrame` and :class:`Series` objects. +``RangeIndex`` is an optimized version of ``Index`` that can represent a monotonic ordered set. These are analogous to Python `range types `__. +A ``RangeIndex`` will always have an ``int64`` dtype. .. ipython:: python - sf[3] - sf[3.0] - sf.loc[3] - sf.loc[3.0] + idx = pd.RangeIndex(5) + idx -The only positional indexing is via ``iloc``. +``RangeIndex`` is the default index for all :class:`DataFrame` and :class:`Series` objects: .. ipython:: python - sf.iloc[3] + ser = pd.Series([1, 2, 3]) + ser.index + df = pd.DataFrame([[1, 2], [3, 4]]) + df.index + df.columns -A scalar index that is not found will raise a ``KeyError``. -Slicing is primarily on the values of the index when using ``[],ix,loc``, and -**always** positional when using ``iloc``. The exception is when the slice is -boolean, in which case it will always be positional. - -.. ipython:: python - - sf[2:4] - sf.loc[2:4] - sf.iloc[2:4] - -In float indexes, slicing using floats is allowed. - -.. ipython:: python - - sf[2.1:4.6] - sf.loc[2.1:4.6] - -In non-float indexes, slicing using floats will raise a ``TypeError``. - -.. code-block:: ipython - - In [1]: pd.Series(range(5))[3.5] - TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index) - - In [1]: pd.Series(range(5))[3.5:4.5] - TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index) - -Here is a typical use-case for using this type of indexing. Imagine that you have a somewhat -irregular timedelta-like indexing scheme, but the data is recorded as floats. This could, for -example, be millisecond offsets. - -.. ipython:: python - - dfir = pd.concat( - [ - pd.DataFrame( - np.random.randn(5, 2), index=np.arange(5) * 250.0, columns=list("AB") - ), - pd.DataFrame( - np.random.randn(6, 2), - index=np.arange(4, 10) * 250.1, - columns=list("AB"), - ), - ] - ) - dfir - -Selection operations then will always work on a value basis, for all selection operators. - -.. ipython:: python - - dfir[0:1000.4] - dfir.loc[0:1001, "A"] - dfir.loc[1000.4] - -You could retrieve the first 1 second (1000 ms) of data as such: - -.. ipython:: python - - dfir[0:1000] - -If you need integer based selection, you should use ``iloc``: +A ``RangeIndex`` will behave similarly to a :class:`Index` with an ``int64`` dtype and operations on a ``RangeIndex``, +whose result cannot be represented by a ``RangeIndex``, but should have an integer dtype, will be converted to an ``Index`` with ``int64``. +For example: .. ipython:: python - dfir.iloc[0:5] + idx[[0, 2] .. _advanced.intervalindex: diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst index 276157b2868b4..76ca27593bf85 100644 --- a/doc/source/user_guide/indexing.rst +++ b/doc/source/user_guide/indexing.rst @@ -1582,8 +1582,28 @@ lookups, data alignment, and reindexing. The easiest way to create an index 'd' in index -You can also pass a ``name`` to be stored in the index: +or using numbers: + +.. ipython:: python + + index = pd.Index([1, 5, 12]) + index + 5 in index + +If no dtype is given ``Index`` tries to infer the dtype from the data. + +It is also possible to give a explicit dtype when instantiating an :class:`Index`: +.. ipython:: python + + index = pd.Index(['e', 'd', 'a', 'b'], dtype="string") + index + index = pd.Index([1, 5, 12], dtype="int8") + index + index = pd.Index([1, 5, 12], dtype="float32") + index + +You can also pass a ``name`` to be stored in the index: .. ipython:: python diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index dc21b9f35d272..50aabad2d0bd3 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -4756,7 +4756,7 @@ Selecting coordinates ^^^^^^^^^^^^^^^^^^^^^ Sometimes you want to get the coordinates (a.k.a the index locations) of your query. This returns an -``Int64Index`` of the resulting locations. These coordinates can also be passed to subsequent +``Index`` of the resulting locations. These coordinates can also be passed to subsequent ``where`` operations. .. ipython:: python diff --git a/doc/source/user_guide/timedeltas.rst b/doc/source/user_guide/timedeltas.rst index 318ca045847f4..3a75aa0b39b1f 100644 --- a/doc/source/user_guide/timedeltas.rst +++ b/doc/source/user_guide/timedeltas.rst @@ -477,7 +477,7 @@ Scalars type ops work as well. These can potentially return a *different* type o # division can result in a Timedelta if the divisor is an integer tdi / 2 - # or a Float64Index if the divisor is a Timedelta + # or a float64 Index if the divisor is a Timedelta tdi / tdi[0] .. _timedeltas.resampling: diff --git a/doc/source/whatsnew/v2.0.0.rst b/doc/source/whatsnew/v2.0.0.rst index f2615950afec1..6e75d9b8415d9 100644 --- a/doc/source/whatsnew/v2.0.0.rst +++ b/doc/source/whatsnew/v2.0.0.rst @@ -28,6 +28,77 @@ The available extras, found in the :ref:`installation guide` for more information (:issue:`42717`) - Removed deprecated :attr:`Timestamp.freq`, :attr:`Timestamp.freqstr` and argument ``freq`` from the :class:`Timestamp` constructor and :meth:`Timestamp.fromordinal` (:issue:`14146`) - Removed deprecated :class:`CategoricalBlock`, :meth:`Block.is_categorical`, require datetime64 and timedelta64 values to be wrapped in :class:`DatetimeArray` or :class:`TimedeltaArray` before passing to :meth:`Block.make_block_same_class`, require ``DatetimeTZBlock.values`` to have the correct ndim when passing to the :class:`BlockManager` constructor, and removed the "fastpath" keyword from the :class:`SingleBlockManager` constructor (:issue:`40226`, :issue:`40571`) - Removed deprecated global option ``use_inf_as_null`` in favor of ``use_inf_as_na`` (:issue:`17126`) From 4721fb0696349940aa9a4eb7e60d5cf55852ba5d Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Wed, 1 Feb 2023 22:11:56 +0000 Subject: [PATCH 2/6] fix doc build issues --- doc/source/user_guide/advanced.rst | 2 +- doc/source/whatsnew/v2.0.0.rst | 16 +++++++++------- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/doc/source/user_guide/advanced.rst b/doc/source/user_guide/advanced.rst index 24c2f0b74a12f..d6f7bc67b543c 100644 --- a/doc/source/user_guide/advanced.rst +++ b/doc/source/user_guide/advanced.rst @@ -876,7 +876,7 @@ For example: .. ipython:: python - idx[[0, 2] + idx[[0, 2]] .. _advanced.intervalindex: diff --git a/doc/source/whatsnew/v2.0.0.rst b/doc/source/whatsnew/v2.0.0.rst index 6e75d9b8415d9..b80a6d81e9de0 100644 --- a/doc/source/whatsnew/v2.0.0.rst +++ b/doc/source/whatsnew/v2.0.0.rst @@ -39,12 +39,12 @@ Previously it was only possible to use ``int64``, ``uint64`` & ``float64`` dtype .. code-block:: ipython - In [1]: pd.Index(1, 2, 3], dtype=np.int8) - Out[1]: Int64Index(1, 2, 3], dtype="int64") - In [2]: pd.Index(1, 2, 3], dtype=np.uint16) - Out[2]: UInt64Index(1, 2, 3], dtype="uint64") - In [3]: pd.Index(1, 2, 3], dtype=np.float32) - Out[3]: Float64Index(1.0, 2.0, 3.0], dtype="float64") + In [1]: pd.Index([1, 2, 3], dtype=np.int8) + Out[1]: Int64Index([1, 2, 3], dtype="int64") + In [2]: pd.Index([1, 2, 3], dtype=np.uint16) + Out[2]: UInt64Index([1, 2, 3], dtype="uint64") + In [3]: pd.Index([1, 2, 3], dtype=np.float32) + Out[3]: Float64Index([1.0, 2.0, 3.0], dtype="float64") :class:`Int64Index`, :class:`UInt64Index` & :class:`Float64Index` were depreciated in pandas version 1.4 and have now been removed. Instead :class:`Index` should be used directly, and @@ -69,7 +69,7 @@ Below is a possibly non-exhaustive list of changes: signed integer arrays previously return an index with ``int64`` dtype, but will now reuse the dtype of the supplied numpy array. So ``Index(np.array([1, 2, 3]))`` will be ``int32`` on 32-bit systems. Instantiating :class:`Index` using a list of numbers will still return 64bit dtypes, - e.g. ``Index( [1, 2, 3])`` will have a ``int64`` dtype, which is the same as previously. + e.g. ``Index([1, 2, 3])`` will have a ``int64`` dtype, which is the same as previously. 2. The various numeric datetime attributes of :class:`DateTimeIndex` (:attr:`~Date_TimeIndex.day`, :attr:`~DateTimeIndex.month`, :attr:`~DateTimeIndex.year` etc.) were previously in of dtype ``int64``, while they were ``int32`` for :class:`DatetimeArray`. They are now @@ -87,6 +87,7 @@ Below is a possibly non-exhaustive list of changes: A = sparse.coo_matrix( ([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])), shape=(3, 4) + ) ser = pd.Series.sparse.from_coo(A) ser.index.dtype @@ -95,6 +96,7 @@ Below is a possibly non-exhaustive list of changes: ``float64`` dtype. It row raises a ``NotImplementedError``: .. ipython:: python + :okexcept: pd.Index([1, 2, 3], dtype=np.float16) From b05448e5adf3e0832c1e43db8c8a39ab834761c8 Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Wed, 1 Feb 2023 22:21:54 +0000 Subject: [PATCH 3/6] fix spelling --- doc/source/user_guide/indexing.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst index 76ca27593bf85..ec7fa5356aada 100644 --- a/doc/source/user_guide/indexing.rst +++ b/doc/source/user_guide/indexing.rst @@ -1590,9 +1590,8 @@ or using numbers: index 5 in index -If no dtype is given ``Index`` tries to infer the dtype from the data. - -It is also possible to give a explicit dtype when instantiating an :class:`Index`: +If no dtype is given, ``Index`` tries to infer the dtype from the data. +It is also possible to give an explicit dtype when instantiating an :class:`Index`: .. ipython:: python From e0b5c65071da4e5595922672635b4c7e109b9388 Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Thu, 2 Feb 2023 06:28:08 +0000 Subject: [PATCH 4/6] fix comments and bugs --- doc/source/whatsnew/v2.0.0.rst | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/doc/source/whatsnew/v2.0.0.rst b/doc/source/whatsnew/v2.0.0.rst index b80a6d81e9de0..780600fdfa452 100644 --- a/doc/source/whatsnew/v2.0.0.rst +++ b/doc/source/whatsnew/v2.0.0.rst @@ -53,9 +53,9 @@ can it now take all numpy numeric dtypes, i.e. .. ipython:: python - pd.Index(1, 2, 3], dtype=np.int8) - pd.Index(1, 2, 3], dtype=np.uint16) - pd.Index(1, 2, 3], dtype=np.float32) + pd.Index([1, 2, 3], dtype=np.int8) + pd.Index([1, 2, 3], dtype=np.uint16) + pd.Index([1, 2, 3], dtype=np.float32) The ability for ``Index`` to hold the numpy numeric dtypes has meant some changes in Pandas functionality. In particular, operations that previously were forced to create 64-bit indexes, @@ -81,10 +81,13 @@ Below is a possibly non-exhaustive list of changes: idx.array.year idx.year -3. Level dtypes on Indexes from :attr:`Series.sparse.from_coo` are now of dtype ``int32``. +3. Level dtypes on Indexes from :meth:`Series.sparse.from_coo` are now of dtype ``int32``, + the same as they are on the ``rows``/``cols`` on a scipy sparse matrix. Previously they + were of dtype ``int64``. .. ipython:: python + from scipy import sparse A = sparse.coo_matrix( ([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])), shape=(3, 4) ) From 9a5f69589fb24d6a1ebcccf50175db41fafad458 Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Thu, 2 Feb 2023 08:54:07 +0000 Subject: [PATCH 5/6] fix doc build --- doc/source/whatsnew/v0.13.0.rst | 2 +- doc/source/whatsnew/v2.0.0.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/whatsnew/v0.13.0.rst b/doc/source/whatsnew/v0.13.0.rst index df9f0a953ffab..8ce038200acc4 100644 --- a/doc/source/whatsnew/v0.13.0.rst +++ b/doc/source/whatsnew/v0.13.0.rst @@ -310,7 +310,7 @@ Float64Index API change - Added a new index type, ``Float64Index``. This will be automatically created when passing floating values in index creation. This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the - same. See :ref:`the docs`, (:issue:`263`) + same. (:issue:`263`) Construction is by default for floating type values. diff --git a/doc/source/whatsnew/v2.0.0.rst b/doc/source/whatsnew/v2.0.0.rst index 780600fdfa452..443066f9f6366 100644 --- a/doc/source/whatsnew/v2.0.0.rst +++ b/doc/source/whatsnew/v2.0.0.rst @@ -759,7 +759,7 @@ Deprecations Removal of prior version deprecations/changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- Removed :class:`Int64Index`, :class:`UInt64Index` and :class:`Float64Index`. See also :ref:`here <_whatsnew_200.enhancements.optional_dependency_management_pip>` for more information (:issue:`42717`) +- Removed :class:`Int64Index`, :class:`UInt64Index` and :class:`Float64Index`. See also :ref:`here ` for more information (:issue:`42717`) - Removed deprecated :attr:`Timestamp.freq`, :attr:`Timestamp.freqstr` and argument ``freq`` from the :class:`Timestamp` constructor and :meth:`Timestamp.fromordinal` (:issue:`14146`) - Removed deprecated :class:`CategoricalBlock`, :meth:`Block.is_categorical`, require datetime64 and timedelta64 values to be wrapped in :class:`DatetimeArray` or :class:`TimedeltaArray` before passing to :meth:`Block.make_block_same_class`, require ``DatetimeTZBlock.values`` to have the correct ndim when passing to the :class:`BlockManager` constructor, and removed the "fastpath" keyword from the :class:`SingleBlockManager` constructor (:issue:`40226`, :issue:`40571`) - Removed deprecated global option ``use_inf_as_null`` in favor of ``use_inf_as_na`` (:issue:`17126`) From d80168eb952ed3dec331507d3e28c92ca04e129c Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Thu, 2 Feb 2023 09:00:23 +0000 Subject: [PATCH 6/6] fix doc build II --- doc/source/whatsnew/v2.0.0.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/whatsnew/v2.0.0.rst b/doc/source/whatsnew/v2.0.0.rst index 443066f9f6366..ec3ae500a2c11 100644 --- a/doc/source/whatsnew/v2.0.0.rst +++ b/doc/source/whatsnew/v2.0.0.rst @@ -28,7 +28,7 @@ The available extras, found in the :ref:`installation guide` for more information (:issue:`42717`) +- Removed :class:`Int64Index`, :class:`UInt64Index` and :class:`Float64Index`. See also :ref:`here ` for more information (:issue:`42717`) - Removed deprecated :attr:`Timestamp.freq`, :attr:`Timestamp.freqstr` and argument ``freq`` from the :class:`Timestamp` constructor and :meth:`Timestamp.fromordinal` (:issue:`14146`) - Removed deprecated :class:`CategoricalBlock`, :meth:`Block.is_categorical`, require datetime64 and timedelta64 values to be wrapped in :class:`DatetimeArray` or :class:`TimedeltaArray` before passing to :meth:`Block.make_block_same_class`, require ``DatetimeTZBlock.values`` to have the correct ndim when passing to the :class:`BlockManager` constructor, and removed the "fastpath" keyword from the :class:`SingleBlockManager` constructor (:issue:`40226`, :issue:`40571`) - Removed deprecated global option ``use_inf_as_null`` in favor of ``use_inf_as_na`` (:issue:`17126`)