@@ -12,10 +12,10 @@ pandas.
1212.. note ::
1313
1414 The choice of using ``NaN `` internally to denote missing data was largely
15- for simplicity and performance reasons. It differs from the MaskedArray
16- approach of, for example, :mod: ` scikits.timeseries `. We are hopeful that
17- NumPy will soon be able to provide a native NA type solution (similar to R)
18- performant enough to be used in pandas .
15+ for simplicity and performance reasons.
16+ Starting from pandas 1.0, some optional data types start experimenting
17+ with a native `` NA `` scalar using a mask-based approach. See
18+ :ref: ` here < missing_data.NA >` for more .
1919
2020See the :ref: `cookbook<cookbook.missing_data> ` for some advanced strategies.
2121
@@ -110,7 +110,7 @@ pandas objects provide compatibility between ``NaT`` and ``NaN``.
110110 .. _missing.inserting :
111111
112112Inserting missing data
113- ----------------------
113+ ~~~~~~~~~~~~~~~~~~~~~~
114114
115115You can insert missing values by simply assigning to containers. The
116116actual missing value used will be chosen based on the dtype.
@@ -135,9 +135,10 @@ For object containers, pandas will use the value given:
135135 s.loc[1 ] = np.nan
136136 s
137137
138+ .. _missing_data.calculations :
138139
139140Calculations with missing data
140- ------------------------------
141+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
141142
142143Missing values propagate naturally through arithmetic operations between pandas
143144objects.
@@ -771,3 +772,139 @@ the ``dtype="Int64"``.
771772 s
772773
773774 See :ref: `integer_na ` for more.
775+
776+
777+ .. _missing_data.NA :
778+
779+ Experimental ``NA `` scalar to denote missing values
780+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
781+
782+ .. warning ::
783+
784+ Experimental: the behaviour of ``pd.NA `` can still change without warning.
785+
786+ .. versionadded :: 1.0.0
787+
788+ Starting from pandas 1.0, an experimental ``pd.NA `` value (singleton) is
789+ available to represent scalar missing values. At this moment, it is used in
790+ the nullable :doc: `integer <integer_na >`, boolean and
791+ :ref: `dedicated string <text.types >` data types as the missing value indicator.
792+
793+ The goal of ``pd.NA `` is provide a "missing" indicator that can be used
794+ consistently accross data types (instead of ``np.nan ``, ``None `` or ``pd.NaT ``
795+ depending on the data type).
796+
797+ For example, when having missing values in a Series with the nullable integer
798+ dtype, it will use ``pd.NA ``:
799+
800+ .. ipython :: python
801+
802+ s = pd.Series([1 , 2 , None ], dtype = " Int64" )
803+ s
804+ s[2 ]
805+ s[2 ] is pd.NA
806+
807+ Currently, pandas does not yet use those data types by default (when creating
808+ a DataFrame or Series, or when reading in data), so you need to specify
809+ the dtype explicitly.
810+
811+ Propagation in arithmetic and comparison operations
812+ ---------------------------------------------------
813+
814+ In general, missing values *propagate * in operations involving ``pd.NA ``. When
815+ one of the operands is unknown, the outcome of the operation is also unknown.
816+
817+ For example, ``pd.NA `` propagates in arithmetic operations, similarly to
818+ ``np.nan ``:
819+
820+ .. ipython :: python
821+
822+ pd.NA + 1
823+ " a" * pd.NA
824+
825+ In equality and comparison operations, ``pd.NA `` also propagates. This deviates
826+ from the behaviour of ``np.nan ``, where comparisons with ``np.nan `` always
827+ return ``False ``.
828+
829+ .. ipython :: python
830+
831+ pd.NA == 1
832+ pd.NA == pd.NA
833+ pd.NA < 2.5
834+
835+ To check if a value is equal to ``pd.NA ``, the :func: `isna ` function can be
836+ used:
837+
838+ .. ipython :: python
839+
840+ pd.isna(pd.NA )
841+
842+ An exception on this basic propagation rule are *reductions * (such as the
843+ mean or the minimum), where pandas defaults to skipping missing values. See
844+ :ref: `above <missing_data.calculations >` for more.
845+
846+ Logical operations
847+ ------------------
848+
849+ For logical operations, ``pd.NA `` follows the rules of the
850+ `three-valued logic <https://en.wikipedia.org/wiki/Three-valued_logic >`__ (or
851+ *Kleene logic *, similarly to R, SQL and Julia). This logic means to only
852+ propagate missing values when it is logically required.
853+
854+ For example, for the logical "or" operation (``| ``), if one of the operands
855+ is ``True ``, we already know the result will be ``True ``, regardless of the
856+ other value (so regardless the missing value would be ``True `` or ``False ``).
857+ In this case, ``pd.NA `` does not propagate:
858+
859+ .. ipython :: python
860+
861+ True | False
862+ True | pd.NA
863+ pd.NA | True
864+
865+ On the other hand, if one of the operands is ``False ``, the result depends
866+ on the value of the other operand. Therefore, in this case ``pd.NA ``
867+ propagates:
868+
869+ .. ipython :: python
870+
871+ False | True
872+ False | False
873+ False | pd.NA
874+
875+ The behaviour of the logical "and" operation (``& ``) can be derived using
876+ similar logic (where now ``pd.NA `` will not propagate if one of the operands
877+ is already ``False ``):
878+
879+ .. ipython :: python
880+
881+ False & True
882+ False & False
883+ False & pd.NA
884+
885+ .. ipython :: python
886+
887+ True & True
888+ True & False
889+ True & pd.NA
890+
891+
892+ ``NA `` in a boolean context
893+ ---------------------------
894+
895+ Since the actual value of an NA is unknown, it is ambiguous to convert NA
896+ to a boolean value. The following raises an error:
897+
898+ .. ipython :: python
899+ :okexcept:
900+
901+ bool (pd.NA )
902+
903+ This also means that ``pd.NA `` cannot be used in a context where it is
904+ evaluated to a boolean, such as ``if condition: ... `` where ``condition `` can
905+ potentially be ``pd.NA ``. In such cases, :func: `isna ` can be used to check
906+ for ``pd.NA `` or ``condition `` being ``pd.NA `` can be avoided, for example by
907+ filling missing values beforehand.
908+
909+ A similar situation occurs when using Series or DataFrame objects in ``if ``
910+ statements, see :ref: `gotchas.truth `.
0 commit comments