Add support for latest-generation Google Cloud machine families and boot disk type configuration #6616
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add support for latest-generation Google Cloud machine families and boot disk type configuration
Problem
Two critical limitations prevented full utilization of Google Cloud Batch capabilities:
1. Missing support for latest-generation machine families
Google Cloud has introduced several new general-purpose machine families that are not currently supported by Nextflow:
These families offer significant improvements in:
Without this support, users cannot leverage:
2. Inability to specify boot disk type
Currently, Nextflow only allows configuring the boot disk size via
google.batch.bootDiskSize, but not the disk type. This creates several issues:Compatibility problems:
pd-balanceddisks (the Google Cloud default)hyperdisk-balancedorpd-ssdPerformance optimization:
pd-ssd(higher IOPS)pd-standard(lower cost)Reference: Google Cloud Disk Types Documentation
Solution
This PR addresses both issues with a comprehensive solution:
1. Add support for latest-generation machine families
Machine type recognition:
GENERAL_PURPOSE_FAMILIES-lssdsuffixTesting:
2. Add
bootDiskTypeconfiguration optionNew configuration parameter:
google { project = 'your-project-id' location = 'us-central1' batch { bootDiskSize = '50 GB' bootDiskType = 'hyperdisk-balanced' // NEW: Specify disk type } }Supported disk types (Google Cloud documentation):
pd-standard- Standard persistent disk (HDD, lowest cost)pd-balanced- Balanced persistent disk (SSD, default for most instances)pd-ssd- SSD persistent disk (highest performance)hyperdisk-balanced- Hyperdisk balanced (required for C4/N4 families)Key features:
bootDiskImagewhen both are specifiedChanges
Core Implementation
GoogleBatchMachineTypeSelector.groovyGENERAL_PURPOSE_FAMILIESconstant for C4/N4 family detectionisHyperdiskOnly()method to identify families requiring HyperdiskfindValidLocalSSDSize()to handle C4/C4D local SSD variantsBatchConfig.groovybootDiskTypefield with@ConfigOptionannotationGoogleBatchTaskHandler.groovybootDiskTypewhen specifiedbootDiskTypeis used with instance templatesTests
GoogleBatchMachineTypeSelectorTest.groovyisHyperdiskOnly()behavior for all new familiesBatchConfigTest.groovybootDiskTypeparsing from configurationbootDiskTypecombined with other boot disk optionsGoogleBatchTaskHandlerTest.groovyDocumentation
docs/reference/config.mdgoogle.batch.bootDiskTypeto configuration referencedocs/google.mdCompatibility
Testing
All tests pass:
bootDiskTypeconfigurationTest coverage:
Use Cases Enabled
1. Using latest-generation machines:
process myTask { machineType 'c4-standard-4' // Now works! memory '16 GB' script: """ # High-performance workload on latest Intel Sapphire Rapids """ }2. Optimizing for cost:
3. Optimizing for performance:
4. Using new machine families:
process highPerf { machineType 'c4a-standard-8' // AMD EPYC Genoa script: """ # Requires hyperdisk-balanced or pd-ssd """ } google.batch.bootDiskType = 'hyperdisk-balanced' // Compatible with C4AReferences