-
Notifications
You must be signed in to change notification settings - Fork 0
Splitting
Splitting, or "fan out", is used to refer to a step Job that creates multiple outputs. Outputs that are expected to be processed by multiple instances of a subsequent step, with each new instance running concurrently with its peer (in parallel). "Splitter" Jobs typically cut up large files with the intention of processing each file separately, in parallel.
The Workflow Engine determines that a step is to be run multiple times (in parallel) by inspecting the plumbing that refers to a prior steps's output. If an input to a step is the result of an output variable whose type (according to the Job Definition) is files
then the step is run once for every file generated by the prior step.
Here's an example workflow: -
- name: split
description: Split an input file
specification:
collection: demo
job: splitsmiles
version: "1.0.0"
plumbing:
- variable: inputFile
from-workflow:
variable: candidateMolecules
- name: parallel
description: Add some params
specification:
collection: demo
job: append-col
version: "1.0.0"
plumbing:
- variable: inputFile
from-step:
name: split
variable: outputBase
In the above example, the parallel step relies on an outputBase
variable of the split step. When the workflow engine decides to run the parallel step it inspects the split step's Job Definition, (version 1.0.0
of the splitsmiles
job in the demo
collection). The engine looks specifically for the definition of the job's outputBase
variable. If the variable is found to be of type files
then the parallel step will be launched once for every file that is found to have been generated by the split step. That is because the splitsmiles
Job clearly defines the output can be one or more files.