Skip to content

Splitting

Alan B. Christie edited this page Sep 16, 2025 · 2 revisions

Splitting, or "fan out", is used to refer to a step Job that creates multiple outputs. Outputs that are expected to be processed by multiple instances of a subsequent step, with each new instance running concurrently with its peer (in parallel). "Splitter" Jobs typically cut up large files with the intention of processing each file separately, in parallel.

The Workflow Engine determines that a step is to be run multiple times (in parallel) by inspecting the plumbing that refers to a prior steps's output. If an input to a step is the result of an output variable whose type (according to the Job Definition) is files then the step is run once for every file generated by the prior step.

Here's an example workflow: -

- name: split
  description: Split an input file
  specification:
    collection: demo
    job: splitsmiles
    version: "1.0.0"
  plumbing:
  - variable: inputFile
    from-workflow:
      variable: candidateMolecules

- name: parallel
  description: Add some params
  specification:
    collection: demo
    job: append-col
    version: "1.0.0"
  plumbing:
  - variable: inputFile
    from-step:
      name: split
      variable: outputBase

In the above example, the parallel step relies on an outputBase variable of the split step. When the workflow engine decides to run the parallel step it inspects the split step's Job Definition, (version 1.0.0 of the splitsmiles job in the demo collection). The engine looks specifically for the definition of the job's outputBase variable. If the variable is found to be of type files then the parallel step will be launched once for every file that is found to have been generated by the split step. That is because the splitsmiles Job clearly defines the output can be one or more files.

Clone this wiki locally