-
Notifications
You must be signed in to change notification settings - Fork 981
Description
Describe the bug
I just installed RMM and it upgraded msgpack-python form 0.62 to 1.0.0. When running on both a 0.11 or 0.12 dask cudf and msgpack 1.0.0, some groupby queries fail with distributed.protocol.core - CRITICAL - Failed to deserialize , ValueError: tuple is not allowed for map key
On a separate system, running 0.13 nightlies package msgpack-python is version 0.6.2, and the query completes fine.
May be similar to the Dask distributed Issue from 8 days ago here: dask/distributed#3491
Error output
distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/protocol/core.py", line 106, in loads
header = msgpack.loads(header, use_list=False, **msgpack_opts)
File "msgpack/_unpacker.pyx", line 195, in msgpack._cmsgpack.unpackb
ValueError: tuple is not allowed for map key
distributed.core - ERROR - tuple is not allowed for map key
Traceback (most recent call last):
File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/core.py", line 456, in handle_stream
msgs = await comm.read()
File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/comm/tcp.py", line 212, in read
frames, deserialize=self.deserialize, deserializers=deserializers
File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/comm/utils.py", line 69, in from_frames
res = _from_frames()
File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/comm/utils.py", line 55, in _from_frames
frames, deserialize=deserialize, deserializers=deserializers
File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/protocol/core.py", line 106, in loads
header = msgpack.loads(header, use_list=False, **msgpack_opts)
File "msgpack/_unpacker.pyx", line 195, in msgpack._cmsgpack.unpackb
Steps/Code to reproduce bug
import dask_cudf as dcu
from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster()
print(cluster)
client = Client(cluster)
client
fn = 'test.csv'
lines = """id3,id4,id5,id6,v1,v2
id0000011793,51,10,59276,1,1
id0000006000,12,58,78315,4,1
id0000012244,25,9,27300,4,5
id0000006000,54,38,65416,2,3
id0000029319,72,92,19046,4,3
id0000068931,87,74,60479,3,2
id0000011793,6,32,90599,4,5
id0000033725,89,85,8657,3,3
id0000006000,12,26,19634,5,2
id0000011793,76,23,38595,5,4
"""
with open(fn, 'w') as fp:
fp.write(lines)
x = dcu.read_csv(fn, n_partitions = 2)
x['id3'] = x['id3'].astype('category')
#max v1 - min v2 by id3
ans = x.groupby(['id3']).agg({'v1': 'max', 'v2': 'min'}).compute()
ans['range_v1_v2']= ans['v1'] -ans['v2']
Expected behavior
this output of ans
v1 v2 range_v1_v2
id3
id0000006000 5 1 4
id0000011793 5 1 4
id0000012244 4 5 -1
id0000029319 4 3 1
id0000033725 3 3 0
id0000068931 3 2 1
Environment overview (please complete the following information)
- Environment location: [Bare-metal]
- Method of cuDF install: conda
Environment details
msgpack-python 1.0.0 py36hc9558a2_0 conda-forge <-- possible problem package
Additional context
Dask distributed had a recent similar issue 8 days ago, found here: dask/distributed#3491