Skip to content

Cross Language Client/Workers #586

@mrocklin

Description

@mrocklin

The Dask.distributed dynamic task scheduler could be replicated across different languages with low-to-moderate effort. This would require someone to build Client and Worker objects in the other language that communicate to the same Scheduler, which contains most of the logic but is fortunately language agnostic. More specifically, there are three players in a dask.distributed cluster, only two of which would need to be rewritten:

  1. Client: something that users use to submit tasks to the scheduler. Would need to be rewritten but is fairly simple. Needs to know how to serialize functions, encode msgpack, and send data over a socket.
  2. Worker: a process running on a remote node that performs those actual tasks. Would need to be rewritten but is also fairly simple. Needs to know how to communicate over a socket, deserialize functions, and execute them asynchronously in some sort of thread pool.
  3. Scheduler: a process to coordinate the actions of all clients and workers, ensuring that the computation proceeds to completion under various stimuli. This is very complex but would not need to be rewritten as it is language agnostic.

About 90% of the complexity of dask.distributed is in the scheduler. Fortunately the scheduler is also language agnostic, and communicates only using msgpack and long bytestrings. It should be doable to re-implement the Client and Workers in another language like R or Julia if anyone has interest. This would require the following understanding in the other language:

  1. How to serialize and deserialize functions and variables in that language
  2. How to communicate over a network, hopefully in a non-blocking way
  3. How to evaluate functions using a separate thread pool

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions