-
-
Notifications
You must be signed in to change notification settings - Fork 740
Description
The Dask.distributed dynamic task scheduler could be replicated across different languages with low-to-moderate effort. This would require someone to build Client and Worker objects in the other language that communicate to the same Scheduler, which contains most of the logic but is fortunately language agnostic. More specifically, there are three players in a dask.distributed cluster, only two of which would need to be rewritten:
- Client: something that users use to submit tasks to the scheduler. Would need to be rewritten but is fairly simple. Needs to know how to serialize functions, encode msgpack, and send data over a socket.
- Worker: a process running on a remote node that performs those actual tasks. Would need to be rewritten but is also fairly simple. Needs to know how to communicate over a socket, deserialize functions, and execute them asynchronously in some sort of thread pool.
- Scheduler: a process to coordinate the actions of all clients and workers, ensuring that the computation proceeds to completion under various stimuli. This is very complex but would not need to be rewritten as it is language agnostic.
About 90% of the complexity of dask.distributed is in the scheduler. Fortunately the scheduler is also language agnostic, and communicates only using msgpack and long bytestrings. It should be doable to re-implement the Client and Workers in another language like R or Julia if anyone has interest. This would require the following understanding in the other language:
- How to serialize and deserialize functions and variables in that language
- How to communicate over a network, hopefully in a non-blocking way
- How to evaluate functions using a separate thread pool