@@ -808,14 +808,14 @@ signals that all requested workers have been launched. Hence the :func:`launch`
808808as all the requested workers have been launched.
809809
810810Newly launched workers are connected to each other, and the master process, in a all-to-all manner.
811- Specifying command argument, ``--worker `` results in the launched processes initializing themselves
811+ Specifying command argument, ``--worker <cookie> `` results in the launched processes initializing themselves
812812as workers and connections being setup via TCP/IP sockets. Optionally ``--bind-to bind_addr[:port] ``
813813may also be specified to enable other workers to connect to it at the specified ``bind_addr `` and ``port ``.
814814This is useful for multi-homed hosts.
815815
816816For non-TCP/IP transports, for example, an implementation may choose to use MPI as the transport,
817- ``--worker `` must NOT be specified. Instead newly launched workers should call ``init_worker() ``
818- before using any of the parallel constructs
817+ ``--worker `` must NOT be specified. Instead newly launched workers should call ``init_worker(cookie ) ``
818+ before using any of the parallel constructs.
819819
820820For every worker launched, the :func: `launch ` method must add a :class: `WorkerConfig `
821821object (with appropriate fields initialized) to ``launched `` ::
@@ -918,7 +918,7 @@ When using custom transports:
918918 workers defaulting to the TCP/IP socket transport implementation
919919- For every incoming logical connection with a worker, ``Base.process_messages(rd::AsyncStream, wr::AsyncStream) `` must be called.
920920 This launches a new task that handles reading and writing of messages from/to the worker represented by the ``AsyncStream `` objects
921- - ``init_worker(manager::FooManager) `` MUST be called as part of worker process initializaton
921+ - ``init_worker(cookie, manager::FooManager) `` MUST be called as part of worker process initializaton
922922- Field ``connect_at::Any `` in :class: `WorkerConfig ` can be set by the cluster manager when ``launch `` is called. The value of
923923 this field is passed in in all ``connect `` callbacks. Typically, it carries information on *how to connect * to a worker. For example,
924924 the TCP/IP socket transport uses this field to specify the ``(host, port) `` tuple at which to connect to a worker
@@ -929,6 +929,54 @@ implementation simply executes an ``exit()`` call on the specified remote worker
929929
930930``examples/clustermanager/simple `` is an example that shows a simple implementation using unix domain sockets for cluster setup
931931
932+ Network requirements for LocalManager and SSHManager
933+ ----------------------------------------------------
934+ Julia clusters are designed to be executed on already secured environments on infrastructure ranging from local laptops,
935+ to departmental clusters or even on the Cloud. This section covers network security requirements for the inbuilt ``LocalManager ``
936+ and ``SSHManager ``.
937+
938+ - The master process does not listen on any port. It only connects out to the workers.
939+
940+ - Each worker binds to only one of the local interfaces and listens on the first free port starting from 9009.
941+
942+ - ``LocalManager ``, i.e. ``addprocs(N) ``, by default binds only to the loopback interface.
943+ This means that workers consequently started on remote hosts, or anyone with malafide intentions
944+ is unable to connect to the cluster. A ``addprocs(4) `` followed by a ``addprocs(["remote_host"]) ``
945+ will fail. Some users may need to create a cluster comprising on their local system and a few remote systems. This can be done by
946+ explicitly requesting ``LocalManager `` to bind to an external network interface via the ``restrict `` keyword
947+ argument - ``addprocs(4; restrict=false) ``.
948+
949+ - ``SSHManager ``, i.e. ``addprocs(list_of_remote_hosts) `` launches workers on remote hosts via SSH.
950+ It is to be noted that by default SSH is only used to launch Julia workers.
951+ Subsequent, master-worker and worker-worker connections use plain, unencrypted TCP/IP sockets. The remote hosts
952+ must have passwordless login enabled. Additional SSH flags or credentials may be specified via keyword
953+ argument ``sshflags ``.
954+
955+ - ``addprocs(list_of_remote_hosts; tunnel=true, sshflags=<ssh keys and other flags>) `` is useful when we wish to use
956+ SSH connections for master-worker too. A typical scenario for this is a local laptop running the Julia REPL(i.e., the master)
957+ with the rest of the cluster on the Cloud, say on Amazon EC2. You will need to open only port 22 into the remote cluster, with
958+ SSH clients authenticated via PKI. ``sshflags `` can specify ``-e <keyfile> `` for the same.
959+
960+ Note that worker-worker connections are still plain TCP and the local security policy on the remote cluster
961+ must allow for free connections between worker nodes, at least for ports 9009 and above.
962+
963+ Securing and encrypting all worker-worker traffic (via SSH), or encrypting individual messages can be done via
964+ a custom ClusterManager.
965+
966+ Cluster cookie
967+ --------------
968+ - All processes in a cluster share the same cookie which, by default, is a randomly generated string on the master process.
969+ - ``cluster_cookie() `` returns the cookie, ``cluster_cookie(cookie) `` sets it.
970+ - All connections are authenticated on both sides to ensure that only workers started by the master are allowed
971+ to connect to each other.
972+ - The cookie must be passed to the workers at startup via argument ``--worker <cookie> ``.
973+ Custom ``ClusterManagers `` can retrieve the cookie on the master by calling
974+ ``cluster_cookie() ``. Cluster managers not using the default TCP/IP transport (and hence not specifying ``--worker ``)
975+ must call ``init_worker(cookie, manager) `` with the same cookie as on the master.
976+
977+ It is to be noted that environments requiring higher levels of security (for example, cookies can be a pre-shared and hence not
978+ specified as a startup arg) can implement the same via a custom ClusterManager.
979+
932980
933981Specifying network topology (Experimental)
934982-------------------------------------------
0 commit comments