Skip to content

DataFrame display fails after .loc in-place assignment for Int64 #1391

@ianozsvald

Description

@ianozsvald

System information

  • Linux, Mint 19.3, 64 bit:
  • Modin version (modin.__version__): 0.7.2
  • Python version: 3.7.6
Using watermark:
CPython 3.7.6
IPython 7.13.0

pandas 1.0.1
modin 0.7.2
ray 0.8.0
dask 2.14.0
numexpr not installed

compiler   : GCC 7.3.0
system     : Linux
release    : 5.3.0-46-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 8
interpreter: 64bit

Describe the problem

Creating a simple dataframe in Modin, converting 1 column to Int64 (nullable integer), using loc to do an assignment of NaN, then displaying the dataframe fails.

Source code / logs

import os
os.environ["MODIN_ENGINE"] = "ray"  # Modin will use Ray
import modin.pandas as pd_md

import numpy as np
dfx = pd_md.DataFrame({'a': np.ones(10)})
dfx['a_I'] = dfx['a'].astype('Int64')
#dfx.loc[0, 'a_I'] = np.nan
dfx


a | a_I
-- | --
1.0 | 1 # dataframe as expected
1.0 | 1
...

Assigning a NaN value causes a failure:

import numpy as np
dfx = pd_md.DataFrame({'a': np.ones(10)})
dfx['a_I'] = dfx['a'].astype('Int64')
dfx.loc[0, 'a_I'] = np.nan
dfx
RayTaskError(TypeError)                   Traceback (most recent call last)
RayTaskError(TypeError): ray_worker (pid=24106, ip=192.168.0.129)
...
    raise TypeError("values must be a 1D list-like")
TypeError: values must be a 1D list-like

Initially I was calling info() and getting the Ray error, it looks like it is related to something more fundamental.

If we use Pandas then this works:

import numpy as np
import pandas as pd

dfx = pd.DataFrame({'a': np.ones(10)})
dfx['a_I'] = dfx['a'].astype('Int64')
dfx.loc[0, 'a_I'] = np.nan
dfx

a | a_I
-- | --
1.0 | <NA>
1.0 | 1
1.0 | 1
...

If I make a list with a NaN and then convert that to Int64 then this works:

dfx = pd_md.DataFrame({'a': [1, 2, 3, np.NaN]})
dfx['a'] = dfx['a'].astype('Int64')
#dfx.loc[0, 'a_I'] = np.nan 
#dfx.info() # this works too
dfx
a
--
1
2
3
<NA>

If I introduce dfx.loc[0, 'a_I'] = np.nan in the above example then it works fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Blocked ❌A pull request that is blockedbug 🦗Something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions