Skip to content

Conversation

hmacdope
Copy link
Contributor

Fixes #229
Depends rapidsai/dask-cuda#1181

Hi all!

Basic implementation of calling CLI's other than the distributed CLI.

Will require rapidsai/dask-cuda#1181 to be merged first, adding a main entrypoint to dask-cuda-worker

I have had to add a bit of a shim to filter out some CLI args that are not shared between the dask-worker and dask-worker-cuda CLIs.

Very basic test to see that it works:

import dask
from dask import distributed
from dask_jobqueue.local import LocalCluster

lc_gpu = LocalCluster(worker_command="dask_cuda.cli", cores=2, memory="2GB")
client = distributed.Client(lc_gpu)
lc_gpu.scale(2)
print(lc_gpu.job_script())
> /home/hmacdope/anaconda3/envs/dask_dev/bin/python -m dask_cuda.cli tcp://192.168.1.5:33677 --name dummy-name --nthreads 1 --memory-limit 0.93GiB --death-timeout 60

I am a new contributor so please let me know if I am missing anything obvious.

@hmacdope
Copy link
Contributor Author

Another option we could pursue is to just call them as scripts without python -m let me know what you think

Copy link
Member

@guillaumeeb guillaumeeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @hmacdope for contributing here!!

Just made a comment on the code, and it would be really nice to add the test you propose in your first post.

death_timeout=None,
local_directory=None,
extra=None,
worker_command="distributed.cli.dask_worker",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with other kwargs, the default here should be None, and the true default defined in the configuration file. See how other kwargs are handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@hmacdope
Copy link
Contributor Author

hmacdope commented Jun 2, 2023

@guillaumeeb hopefully I have addressed your comments.

@hmacdope hmacdope requested a review from guillaumeeb June 2, 2023 04:58
Copy link
Member

@guillaumeeb guillaumeeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! We don't need to wait for the Cuda issue to be fixed to merge, right?

Also, if at some point you could add a bit documentation about using this new option, this would be really nice!

@guillaumeeb
Copy link
Member

We'll also wait till the CI is green!

@hmacdope
Copy link
Contributor Author

hmacdope commented Jun 4, 2023

This looks good to me! We don't need to wait for the Cuda issue to be fixed to merge, right?

It is merged!

Also, if at some point you could add a bit documentation about using this new option, this would be really nice!

Yes I will raise an issue.

@hmacdope
Copy link
Contributor Author

hmacdope commented Jun 4, 2023

@guillaumeeb any idea why the container builds are failing?

@guillaumeeb
Copy link
Member

Nope, I need to check these. This might not be a problem, but the CI / build (none) failing is, could try to check this one?

@guillaumeeb
Copy link
Member

I'm not sure what the problem is with the build, it looks like the image takes too much time to build, weird, more than 6 hours !!

@jacobtomlinson
Copy link
Member

Dask/distributed recently dropped Python 3.8 and I noticed that the failing CI uses it, maybe that is the problem?

https://github.com/dask/distributed/blob/129b7cb70e2b77f4e13e27aefe9a7dbfc31a53e4/pyproject.toml#L26

@hmacdope
Copy link
Contributor Author

@guillaumeeb need anything more from me here?

@hmacdope
Copy link
Contributor Author

hmacdope commented Jul 7, 2023

@guillaumeeb @jacobtomlinson @lesteve would be great to get this finalised if possible?

@guillaumeeb
Copy link
Member

Sorry for the delay here @hmacdope, I wanted to check the CI but didn't have the time. I'll merge anyway. Would you need an official release at some point?

@guillaumeeb guillaumeeb merged commit 4bbd0a0 into dask:main Jul 16, 2023
@hmacdope
Copy link
Contributor Author

Sorry for the delay here @hmacdope, I wanted to check the CI but didn't have the time. I'll merge anyway. Would you need an official release at some point?

Thanks so much @guillaumeeb! No worries everyone is busy. :) all good without an official release.

@lesteve lesteve mentioned this pull request Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide parameter for "dask-worker"

3 participants