Skip to content

3.13.2 config file error upon startup for UnkillableStepTimeout and MessageTimeout #6922

@gwolski

Description

@gwolski

Using Rocky8.10/APC 3.13.2.

On head node and compute node start up there are unsettling error messages:

headnode:
/var/log/messages-20250713:Jul 8 18:04:35 ip-10-6-6-5 slurmctld[1199686]: slurmctld: error: UnkillableStepTimeout must be at least 5 times greater than MessageTimeout, otherwise nodes may go down with the reason "KillTaskFailed". Current values: UnkillableStepTimeout=180, MessageTimeout=60

Compute node:
Jul 19 16:06:24 ip-10-6-4-238 slurmd[4569]: slurmd: error: UnkillableStepTimeout must be at least 5 times greater than MessageTimeout, otherwise nodes may go down with the reason "KillTaskFailed". Current values: UnkillableStepTimeout=180, MessageTimeout=60
Jul 19 16:06:24 ip-10-6-4-238 slurmd[4569]: error: UnkillableStepTimeout must be at least 5 times greater than MessageTimeout, otherwise nodes may go down with the reason "KillTaskFailed". Current values: UnkillableStepTimeout=180, MessageTimeout=60

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions