Skip to content

[BUG] dask_cudf breaks with msgpack-python 1.0.0 after RMM conda install #4254

@taureandyernv

Description

@taureandyernv

Describe the bug
I just installed RMM and it upgraded msgpack-python form 0.62 to 1.0.0. When running on both a 0.11 or 0.12 dask cudf and msgpack 1.0.0, some groupby queries fail with distributed.protocol.core - CRITICAL - Failed to deserialize , ValueError: tuple is not allowed for map key

On a separate system, running 0.13 nightlies package msgpack-python is version 0.6.2, and the query completes fine.

May be similar to the Dask distributed Issue from 8 days ago here: dask/distributed#3491

Error output

distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
  File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/protocol/core.py", line 106, in loads
    header = msgpack.loads(header, use_list=False, **msgpack_opts)
  File "msgpack/_unpacker.pyx", line 195, in msgpack._cmsgpack.unpackb
ValueError: tuple is not allowed for map key
distributed.core - ERROR - tuple is not allowed for map key
Traceback (most recent call last):
  File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/core.py", line 456, in handle_stream
    msgs = await comm.read()
  File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/comm/tcp.py", line 212, in read
    frames, deserialize=self.deserialize, deserializers=deserializers
  File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/comm/utils.py", line 69, in from_frames
    res = _from_frames()
  File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/comm/utils.py", line 55, in _from_frames
    frames, deserialize=deserialize, deserializers=deserializers
  File "/home/taurean/miniconda3/envs/rapids12/lib/python3.6/site-packages/distributed/protocol/core.py", line 106, in loads
    header = msgpack.loads(header, use_list=False, **msgpack_opts)
  File "msgpack/_unpacker.pyx", line 195, in msgpack._cmsgpack.unpackb

Steps/Code to reproduce bug

import dask_cudf as dcu
from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster

cluster = LocalCUDACluster()
print(cluster)
client = Client(cluster)
client

fn = 'test.csv'
lines = """id3,id4,id5,id6,v1,v2
id0000011793,51,10,59276,1,1
id0000006000,12,58,78315,4,1
id0000012244,25,9,27300,4,5
id0000006000,54,38,65416,2,3
id0000029319,72,92,19046,4,3
id0000068931,87,74,60479,3,2
id0000011793,6,32,90599,4,5
id0000033725,89,85,8657,3,3
id0000006000,12,26,19634,5,2
id0000011793,76,23,38595,5,4
"""
with open(fn, 'w') as fp:
    fp.write(lines)
x = dcu.read_csv(fn,  n_partitions = 2)
x['id3'] = x['id3'].astype('category')

#max v1 - min v2 by id3
ans = x.groupby(['id3']).agg({'v1': 'max', 'v2': 'min'}).compute()
ans['range_v1_v2']= ans['v1'] -ans['v2']

Expected behavior
this output of ans

                        v1  v2  range_v1_v2
id3			
id0000006000	5	1	4
id0000011793	5	1	4
id0000012244	4	5	-1
id0000029319	4	3	1
id0000033725	3	3	0
id0000068931	3	2	1

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: conda

Environment details
msgpack-python 1.0.0 py36hc9558a2_0 conda-forge <-- possible problem package

Additional context
Dask distributed had a recent similar issue 8 days ago, found here: dask/distributed#3491

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions