-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
Should raise on a passed list to value
The results from the fillna() method are very strange when the value parameter is given a list.
For example, using a simple example DataFrame:
df = pandas.DataFrame({'A': [numpy.nan, 1, 2], 'B': [10, numpy.nan, 12], 'C': [[20, 21, 22], [23, 24, 25], numpy.nan]})
df
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 NaNdf.fillna(value=[100, 101, 102])
A B C
0 100 10 [20, 21, 22]
1 1 101 [23, 24, 25]
2 2 12 102
So it appears the values in the list are used to fill the 'holes' in order, if the list has the same length as number of holes. But if the the list is shorter than the number of holes, the behavior changes to using only the first value in the list:
df.fillna(value=[100, 101])
A B C
0 100 10 [20, 21, 22]
1 1 100 [23, 24, 25]
2 2 12 100
If the list is longer than the number of holes, you get something even more odd:
df.fillna(value=[100, 101, 102, 103])
A B C
0 100 10 [20, 21, 22]
1 1 100 [23, 24, 25]
2 2 12 102
If you specify provide a dict that specifies the fill values by column, the values from the list are used within that column only:
df.fillna(value={'C': [100, 101]})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 100
Since it's not always practical to know the number of NaN values a priori, or to customize the length of the value list to match it, this is problematic. Furthermore, some desired values get over-interpreted and cannot be used:
For example, if you want to actually replace all NaN instances in a single column with the same list (either empty or non-empty), I can't figure out how to do it:
df.fillna(value={'C': [[100,101]]})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 100
Indeed, if you specify the empty list nothing is filled:
df.fillna(value={'C': list()})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 NaN
But a dict works fine:
f.fillna(value={'C': {0: 1}})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 {0: 1}df.fillna(value={'C': dict()})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 {}
So it appears the fillna() is making a lot of decisions about how the fill values should be applied, and certain desired outcomes can't be achieved because it's being too 'clever'.