You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I need to create a new column within a pandas daframe from the values of another column. For instance, provided the city column in the following dataframe , the new column will duplicate the value from the city column only if it is within a list , otherwise the correspnding netries woill be populated as "other". The following code snippet works like a charm.
But when I try to implement it with pandas assign function as below
# Your code heredf=pd.DataFrame({'city': ['Kolkata','Delhi','Mumbai','Bankura','Dhaka','Jaipur','Goa',
'Delhi','Mumbai','Kolkata'],'temp':[iforiinrange(10)]})
df.assign(MajorCity=lambdax:x.cityifx.cityin ['Kolkata','Delhi','Mumbai'] else'other')
I received the following error:
Error message:
ValueError Traceback (most recent call last)
in
1 df = pd.DataFrame({'city': ['Kolkata','Delhi','Mumbai','Bankura','Dhaka','Jaipur','Goa',
2 'Delhi','Mumbai','Kolkata'],'temp':[i for i in range(10)]})
----> 3 df.assign(MajorCity = lambda x:x.city if x.city in ['Kolkata','Delhi','Mumbai'] else 'other')
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in assign(self, **kwargs)
3667 if PY36:
3668 for k, v in kwargs.items():
-> 3669 data[k] = com.apply_if_callable(v, data)
3670 else:
3671 # <= 3.5: do all calculations first...
~/anaconda3/lib/python3.7/site-packages/pandas/core/common.py in apply_if_callable(maybe_callable, obj, **kwargs)
363
364 if callable(maybe_callable):
--> 365 return maybe_callable(obj, **kwargs)
366
367 return maybe_callable
in (x)
1 df = pd.DataFrame({'city': ['Kolkata','Delhi','Mumbai','Bankura','Dhaka','Jaipur','Goa',
2 'Delhi','Mumbai','Kolkata'],'temp':[i for i in range(10)]})
----> 3 df.assign(MajorCity = lambda x:x.city if x.city in ['Kolkata','Delhi','Mumbai'] else 'other')
~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in nonzero(self)
1553 "The truth value of a {0} is ambiguous. "
1554 "Use a.empty, a.bool(), a.item(), a.any() or a.all().".format(
-> 1555 self.class.name
1556 )
1557 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
INSTALLED VERSIONS
------------------
commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-72-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
The callable you pass to df.assign takes as input the entire dataframe, but with df.apply(callable, axis=1) you are passing each row to the callable.
in the case of
df.assign(MajorCity = lambda x:x.city if x.city in ['Kolkata','Delhi','Mumbai'] else 'other')
the x in the lambda function is df and x.city is a Series equivalent to df.city which is where x.city in ['a', 'b', 'c'] leads to the error. In the apply case, x is one row in df
Problem description
I need to create a new column within a pandas daframe from the values of another column. For instance, provided the city column in the following dataframe , the new column will duplicate the value from the city column only if it is within a list , otherwise the correspnding netries woill be populated as "other". The following code snippet works like a charm.
Output
But when I try to implement it with pandas assign function as below
I received the following error:
Error message:
ValueError Traceback (most recent call last)
in
1 df = pd.DataFrame({'city': ['Kolkata','Delhi','Mumbai','Bankura','Dhaka','Jaipur','Goa',
2 'Delhi','Mumbai','Kolkata'],'temp':[i for i in range(10)]})
----> 3 df.assign(MajorCity = lambda x:x.city if x.city in ['Kolkata','Delhi','Mumbai'] else 'other')
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in assign(self, **kwargs)
3667 if PY36:
3668 for k, v in kwargs.items():
-> 3669 data[k] = com.apply_if_callable(v, data)
3670 else:
3671 # <= 3.5: do all calculations first...
~/anaconda3/lib/python3.7/site-packages/pandas/core/common.py in apply_if_callable(maybe_callable, obj, **kwargs)
363
364 if callable(maybe_callable):
--> 365 return maybe_callable(obj, **kwargs)
366
367 return maybe_callable
in (x)
1 df = pd.DataFrame({'city': ['Kolkata','Delhi','Mumbai','Bankura','Dhaka','Jaipur','Goa',
2 'Delhi','Mumbai','Kolkata'],'temp':[i for i in range(10)]})
----> 3 df.assign(MajorCity = lambda x:x.city if x.city in ['Kolkata','Delhi','Mumbai'] else 'other')
~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in nonzero(self)
1553 "The truth value of a {0} is ambiguous. "
1554 "Use a.empty, a.bool(), a.item(), a.any() or a.all().".format(
-> 1555 self.class.name
1556 )
1557 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
pandas : 0.25.3
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.10
pytest : 4.6.2
hypothesis : None
sphinx : 2.1.0
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.3
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.3
matplotlib : 3.1.0
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.4
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8
The text was updated successfully, but these errors were encountered: