Skip to content

fillna() does not work when value parameter is a list #3435

@ijmcf

Description

@ijmcf

Should raise on a passed list to value

The results from the fillna() method are very strange when the value parameter is given a list.

For example, using a simple example DataFrame:

df = pandas.DataFrame({'A': [numpy.nan, 1, 2], 'B': [10, numpy.nan, 12], 'C': [[20, 21, 22], [23, 24, 25], numpy.nan]})
df
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 NaN

df.fillna(value=[100, 101, 102])
A B C
0 100 10 [20, 21, 22]
1 1 101 [23, 24, 25]
2 2 12 102

So it appears the values in the list are used to fill the 'holes' in order, if the list has the same length as number of holes. But if the the list is shorter than the number of holes, the behavior changes to using only the first value in the list:

df.fillna(value=[100, 101])
A B C
0 100 10 [20, 21, 22]
1 1 100 [23, 24, 25]
2 2 12 100

If the list is longer than the number of holes, you get something even more odd:

df.fillna(value=[100, 101, 102, 103])
A B C
0 100 10 [20, 21, 22]
1 1 100 [23, 24, 25]
2 2 12 102

If you specify provide a dict that specifies the fill values by column, the values from the list are used within that column only:

df.fillna(value={'C': [100, 101]})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 100

Since it's not always practical to know the number of NaN values a priori, or to customize the length of the value list to match it, this is problematic. Furthermore, some desired values get over-interpreted and cannot be used:

For example, if you want to actually replace all NaN instances in a single column with the same list (either empty or non-empty), I can't figure out how to do it:

df.fillna(value={'C': [[100,101]]})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 100

Indeed, if you specify the empty list nothing is filled:

df.fillna(value={'C': list()})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 NaN

But a dict works fine:

f.fillna(value={'C': {0: 1}})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 {0: 1}

df.fillna(value={'C': dict()})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 {}

So it appears the fillna() is making a lot of decisions about how the fill values should be applied, and certain desired outcomes can't be achieved because it's being too 'clever'.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions