pandas-dev
diff --git a/‎asv_bench/benchmarks/replace.py
Lines changed: 17 additions & 0 deletions b/‎asv_bench/benchmarks/replace.py
Lines changed: 17 additions & 0 deletions
diff --git a/‎doc/source/development/contributing.rst
Lines changed: 130 additions & 0 deletions b/‎doc/source/development/contributing.rst
Lines changed: 130 additions & 0 deletions
diff --git a/‎doc/source/getting_started/10min.rst
Lines changed: 1 addition & 1 deletion b/‎doc/source/getting_started/10min.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/source/getting_started/basics.rst
Lines changed: 6 additions & 6 deletions b/‎doc/source/getting_started/basics.rst
Lines changed: 6 additions & 6 deletions
diff --git a/‎doc/source/getting_started/comparison/comparison_with_r.rst
Lines changed: 4 additions & 4 deletions b/‎doc/source/getting_started/comparison/comparison_with_r.rst
Lines changed: 4 additions & 4 deletions
diff --git a/‎doc/source/user_guide/advanced.rst
Lines changed: 1 addition & 1 deletion b/‎doc/source/user_guide/advanced.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/source/user_guide/cookbook.rst
Lines changed: 3 additions & 3 deletions b/‎doc/source/user_guide/cookbook.rst
Lines changed: 3 additions & 3 deletions
diff --git a/‎doc/source/user_guide/enhancingperf.rst
Lines changed: 6 additions & 6 deletions b/‎doc/source/user_guide/enhancingperf.rst
Lines changed: 6 additions & 6 deletions
@@ -36,6 +36,23 @@ def time_replace_series(self, inplace):
         self.s.replace(self.to_rep, inplace=inplace)
 
 
+class ReplaceList:
+    # GH#28099
+
+    params = [(True, False)]
+    param_names = ["inplace"]
+
+    def setup(self, inplace):
+        self.df = pd.DataFrame({"A": 0, "B": 0}, index=range(4 * 10 ** 7))
+
+    def time_replace_list(self, inplace):
+        self.df.replace([np.inf, -np.inf], np.nan, inplace=inplace)
+
+    def time_replace_list_one_match(self, inplace):
+        # the 1 can be held in self._df.blocks[0], while the inf and -inf cant
+        self.df.replace([np.inf, -np.inf, 1], np.nan, inplace=inplace)
+
+
 class Convert:
 
     params = (["DataFrame", "Series"], ["Timestamp", "Timedelta"])
 
@@ -699,6 +699,136 @@ You'll also need to
 
 See :ref:`contributing.warnings` for more.
 
+.. _contributing.type_hints:
+
+Type Hints
+----------
+
+*pandas* strongly encourages the use of :pep:`484` style type hints. New development should contain type hints and pull requests to annotate existing code are accepted as well!
+
+Style Guidelines
+~~~~~~~~~~~~~~~~
+
+Types imports should follow the ``from typing import ...`` convention. So rather than
+
+.. code-block:: python
+
+   import typing
+
+   primes = []  # type: typing.List[int]
+
+You should write
+
+.. code-block:: python
+
+   from typing import List, Optional, Union
+
+   primes = []  # type: List[int]
+
+``Optional`` should be used where applicable, so instead of
+
+.. code-block:: python
+
+   maybe_primes = []  # type: List[Union[int, None]]
+
+You should write
+
+.. code-block:: python
+
+   maybe_primes = []  # type: List[Optional[int]]
+
+In some cases in the code base classes may define class variables that shadow builtins. This causes an issue as described in `Mypy 1775 <https://github.com/python/mypy/issues/1775#issuecomment-310969854>`_. The defensive solution here is to create an unambiguous alias of the builtin and use that without your annotation. For example, if you come across a definition like
+
+.. code-block:: python
+
+   class SomeClass1:
+       str = None
+
+The appropriate way to annotate this would be as follows
+
+.. code-block:: python
+
+   str_type = str
+
+   class SomeClass2:
+       str = None  # type: str_type
+
+In some cases you may be tempted to use ``cast`` from the typing module when you know better than the analyzer. This occurs particularly when using custom inference functions. For example
+
+.. code-block:: python
+
+   from typing import cast
+
+   from pandas.core.dtypes.common import is_number
+
+   def cannot_infer_bad(obj: Union[str, int, float]):
+
+       if is_number(obj):
+           ...
+       else:  # Reasonably only str objects would reach this but...
+           obj = cast(str, obj)  # Mypy complains without this!
+	   return obj.upper()
+
+The limitation here is that while a human can reasonably understand that ``is_number`` would catch the ``int`` and ``float`` types mypy cannot make that same inference just yet (see `mypy #5206 <https://github.com/python/mypy/issues/5206>`_. While the above works, the use of ``cast`` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable
+
+.. code-block:: python
+
+   def cannot_infer_good(obj: Union[str, int, float]):
+
+       if isinstance(obj, str):
+           return obj.upper()
+       else:
+           ...
+
+With custom types and inference this is not always possible so exceptions are made, but every effort should be exhausted to avoid ``cast`` before going down such paths.
+
+Syntax Requirements
+~~~~~~~~~~~~~~~~~~~
+
+Because *pandas* still supports Python 3.5, :pep:`526` does not apply and variables **must** be annotated with type comments. Specifically, this is a valid annotation within pandas:
+
+.. code-block:: python
+
+   primes = []  # type: List[int]
+
+Whereas this is **NOT** allowed:
+
+.. code-block:: python
+
+   primes: List[int] = []  # not supported in Python 3.5!
+
+Note that function signatures can always be annotated per :pep:`3107`:
+
+.. code-block:: python
+
+   def sum_of_primes(primes: List[int] = []) -> int:
+       ...
+
+
+Pandas-specific Types
+~~~~~~~~~~~~~~~~~~~~~
+
+Commonly used types specific to *pandas* will appear in `pandas._typing <https://github.com/pandas-dev/pandas/blob/master/pandas/_typing.py>`_ and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas.
+
+For example, quite a few functions in *pandas* accept a ``dtype`` argument. This can be expressed as a string like ``"object"``, a ``numpy.dtype`` like ``np.int64`` or even a pandas ``ExtensionDtype`` like ``pd.CategoricalDtype``. Rather than burden the user with having to constantly annotate all of those options, this can simply be imported and reused from the pandas._typing module
+
+.. code-block:: python
+
+   from pandas._typing import Dtype
+
+   def as_type(dtype: Dtype) -> ...:
+       ...
+
+This module will ultimately house types for repeatedly used concepts like "path-like", "array-like", "numeric", etc... and can also hold aliases for commonly appearing parameters like `axis`. Development of this module is active so be sure to refer to the source for the most up to date list of available types.
+
+Validating Type Hints
+~~~~~~~~~~~~~~~~~~~~~
+
+*pandas* uses `mypy <http://mypy-lang.org>`_ to statically analyze the code base and type hints. After making any change you can ensure your type hints are correct by running
+
+.. code-block:: shell
+
+   mypy pandas
 
 .. _contributing.ci:
 
 
@@ -278,7 +278,7 @@ Using a single column's values to select data.
 
 .. ipython:: python
 
-   df[df.A > 0]
+   df[df['A'] > 0]
 
 Selecting values from a DataFrame where a boolean condition is met.
 
 
@@ -926,7 +926,7 @@ Single aggregations on a ``Series`` this will return a scalar value:
 
 .. ipython:: python
 
-   tsdf.A.agg('sum')
+   tsdf['A'].agg('sum')
 
 
 Aggregating with multiple functions
@@ -950,13 +950,13 @@ On a ``Series``, multiple functions return a ``Series``, indexed by the function
 
 .. ipython:: python
 
-   tsdf.A.agg(['sum', 'mean'])
+   tsdf['A'].agg(['sum', 'mean'])
 
 Passing a ``lambda`` function will yield a ``<lambda>`` named row:
 
 .. ipython:: python
 
-   tsdf.A.agg(['sum', lambda x: x.mean()])
+   tsdf['A'].agg(['sum', lambda x: x.mean()])
 
 Passing a named function will yield that name for the row:
 
@@ -965,7 +965,7 @@ Passing a named function will yield that name for the row:
    def mymean(x):
        return x.mean()
 
-   tsdf.A.agg(['sum', mymean])
+   tsdf['A'].agg(['sum', mymean])
 
 Aggregating with a dict
 +++++++++++++++++++++++
@@ -1065,7 +1065,7 @@ Passing a single function to ``.transform()`` with a ``Series`` will yield a sin
 
 .. ipython:: python
 
-   tsdf.A.transform(np.abs)
+   tsdf['A'].transform(np.abs)
 
 
 Transform with multiple functions
@@ -1084,7 +1084,7 @@ resulting column names will be the transforming functions.
 
 .. ipython:: python
 
-   tsdf.A.transform([np.abs, lambda x: x + 1])
+   tsdf['A'].transform([np.abs, lambda x: x + 1])
 
 
 Transforming with a dict
 
@@ -81,7 +81,7 @@ R                                            pandas
 ===========================================  ===========================================
 ``select(df, col_one = col1)``               ``df.rename(columns={'col1': 'col_one'})['col_one']``
 ``rename(df, col_one = col1)``               ``df.rename(columns={'col1': 'col_one'})``
-``mutate(df, c=a-b)``                        ``df.assign(c=df.a-df.b)``
+``mutate(df, c=a-b)``                        ``df.assign(c=df['a']-df['b'])``
 ===========================================  ===========================================
 
 
@@ -258,8 +258,8 @@ index/slice as well as standard boolean indexing:
 
    df = pd.DataFrame({'a': np.random.randn(10), 'b': np.random.randn(10)})
    df.query('a <= b')
-   df[df.a <= df.b]
-   df.loc[df.a <= df.b]
+   df[df['a'] <= df['b']]
+   df.loc[df['a'] <= df['b']]
 
 For more details and examples see :ref:`the query documentation
 <indexing.query>`.
@@ -284,7 +284,7 @@ In ``pandas`` the equivalent expression, using the
 
    df = pd.DataFrame({'a': np.random.randn(10), 'b': np.random.randn(10)})
    df.eval('a + b')
-   df.a + df.b  # same as the previous expression
+   df['a'] + df['b']  # same as the previous expression
 
 In certain cases :meth:`~pandas.DataFrame.eval` will be much faster than
 evaluation in pure Python. For more details and examples see :ref:`the eval
 
@@ -738,7 +738,7 @@ and allows efficient indexing and storage of an index with a large number of dup
    df['B'] = df['B'].astype(CategoricalDtype(list('cab')))
    df
    df.dtypes
-   df.B.cat.categories
+   df['B'].cat.categories
 
 Setting the index will create a ``CategoricalIndex``.
 
 
@@ -592,8 +592,8 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
 .. ipython:: python
 
    df = pd.DataFrame([0, 1, 0, 1, 1, 1, 0, 1, 1], columns=['A'])
-   df.A.groupby((df.A != df.A.shift()).cumsum()).groups
-   df.A.groupby((df.A != df.A.shift()).cumsum()).cumsum()
+   df['A'].groupby((df['A'] != df['A'].shift()).cumsum()).groups
+   df['A'].groupby((df['A'] != df['A'].shift()).cumsum()).cumsum()
 
 Expanding data
 **************
@@ -719,7 +719,7 @@ Rolling Apply to multiple columns where function calculates a Series before a Sc
    df
 
    def gm(df, const):
-       v = ((((df.A + df.B) + 1).cumprod()) - 1) * const
+       v = ((((df['A'] + df['B']) + 1).cumprod()) - 1) * const
        return v.iloc[-1]
 
    s = pd.Series({df.index[i]: gm(df.iloc[i:min(i + 51, len(df) - 1)], 5)
 
@@ -393,15 +393,15 @@ Consider the following toy example of doubling each observation:
 .. code-block:: ipython
 
    # Custom function without numba
-   In [5]: %timeit df['col1_doubled'] = df.a.apply(double_every_value_nonumba)  # noqa E501
+   In [5]: %timeit df['col1_doubled'] = df['a'].apply(double_every_value_nonumba)  # noqa E501
    1000 loops, best of 3: 797 us per loop
 
    # Standard implementation (faster than a custom function)
-   In [6]: %timeit df['col1_doubled'] = df.a * 2
+   In [6]: %timeit df['col1_doubled'] = df['a'] * 2
    1000 loops, best of 3: 233 us per loop
 
    # Custom function with numba
-   In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df.a.to_numpy())
+   In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df['a'].to_numpy())
    1000 loops, best of 3: 145 us per loop
 
 Caveats
@@ -643,8 +643,8 @@ The equivalent in standard Python would be
 .. ipython:: python
 
    df = pd.DataFrame(dict(a=range(5), b=range(5, 10)))
-   df['c'] = df.a + df.b
-   df['d'] = df.a + df.b + df.c
+   df['c'] = df['a'] + df['b']
+   df['d'] = df['a'] + df['b'] + df['c']
    df['a'] = 1
    df
 
@@ -688,7 +688,7 @@ name in an expression.
 
    a = np.random.randn()
    df.query('@a < a')
-   df.loc[a < df.a]  # same as the previous expression
+   df.loc[a < df['a']]  # same as the previous expression
 
 With :func:`pandas.eval` you cannot use the ``@`` prefix *at all*, because it
 isn't defined in that context. ``pandas`` will let you know this if you try to