Skip to content

Build Image fails for Rocky 9 with Docker CE on 3.11+ due to overwrite of /etc/yum/vars/releasever #6931

@keien

Description

@keien

Required Info:

  • AWS ParallelCluster version [e.g. 3.1.1]: 3.11.1, 3.13.2
  • Full cluster configuration without any credentials or personal data.
Region: us-west-2
Image:
  Name: slurm-rocky-test
  Tags:
    - Key: Name
      Value: slurm-rocky-test
  RootVolume:
    Size: 50
    Encrypted: yes
    KmsKeyId: arn:aws:kms:us-west-2:171496337684:key/4b63e407-423e-496f-b937-fd5ca6421fc4

Build:
  InstanceType: m5.4xlarge
  ParentImage: ami-022aac693cf236af2
  Tags:
    - Key: purpose
      Value: infrastructure
  SecurityGroupIds:
    - sg-74e42c12
    - sg-be56d4d8
  SubnetId: subnet-00bbd054b223b7501
  • Cluster name: N/A

Bug description and how to reproduce:

When using pcluster build-image using a Rocky 9 AMI that has Docker CE installed, the image build fails. In the image builder logs, we get this error:

2025-08-06T20:07:28.609Z
CmdExecution: Stderr: OS='rocky9'
PLATFORM='RHEL'

if [[ ${PLATFORM} == RHEL ]]; then
  yum -y update krb5-libs
  yum -y groupinstall development && sudo yum -y install wget jq
  if [[ ${OS} != alinux2023 ]]; then
    # Do not install curl on al2023 since curl-minimal-8.5.0-1.amzn2023* is already shipped and conflicts.
    yum -y install curl
  fi
elif [[ ${PLATFORM} == DEBIAN ]]; then
  if [[ "false" == "false" ]]; then
    # disable apt-daily.timer to avoid dpkg lock
    flock $(apt-config shell StateDir Dir::State/d | sed -r "s/.*'(.*)\/?'$/\1/")/daily_lock systemctl disable --now apt-daily.timer apt-daily.service apt-daily-upgrade.timer apt-daily-upgrade.service
    # disable unattended upgrades
    sed "/Update-Package-Lists/s/\"1\"/\"0\"/; /Unattended-Upgrade/s/\"1\"/\"0\"/;" /etc/apt/apt.conf.d/20auto-upgrades > "/etc/apt/apt.conf.d/51pcluster-unattended-upgrades"
  fi
  apt-cache search build-essential
  apt-get clean
  apt-get -y update
  apt-get -y install build-essential curl wget jq
fi
Errors during downloading metadata for repository 'docker-ce-stable':
  - Status code: 404 for https://download.docker.com/linux/rhel/9.6/x86_64/stable/repodata/repomd.xml (IP: 3.175.34.7)
Error: Failed to download metadata for repo 'docker-ce-stable': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
Errors during downloading metadata for repository 'docker-ce-stable':
  - Status code: 404 for https://download.docker.com/linux/rhel/9.6/x86_64/stable/repodata/repomd.xml (IP: 3.175.34.15)
Error: Failed to download metadata for repo 'docker-ce-stable': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
Errors during downloading metadata for repository 'docker-ce-stable':
  - Status code: 404 for https://download.docker.com/linux/rhel/9.6/x86_64/stable/repodata/repomd.xml (IP: 3.175.34.116)
Error: Failed to download metadata for repo 'docker-ce-stable': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

You can in fact verify that https://download.docker.com/linux/rhel/9.6/x86_64/stable/repodata/repomd.xml will 404. This is because the URL is expected to be https://download.docker.com/linux/rhel/9/x86_64/stable/repodata/repomd.xml for all RHEL 9 releases.

It looks like the URL changes due to this section earlier in the image builder steps:

2025-08-06T20:07:20.212Z
CmdExecution: Stderr: OS='rocky9'
PLATFORM='RHEL'
KERNEL_VERSION=$(uname -a)
RELEASE_VERSION='9.6'
if [[ ${PLATFORM} == RHEL ]]; then
  if [[ ${OS} == rhel9 ]] || [[ ${OS} == rocky9 ]]; then
    if [[ ! -f /etc/yum/vars/releasever ]]; then
      echo "yes" > /opt/parallelcluster/pin_releasesever
      echo ${RELEASE_VERSION} > /etc/yum/vars/releasever
      yum clean all
    fi
  fi
  PACKAGE_LIST="kernel-headers-$(uname -r) kernel-devel-$(uname -r)"
  if [[ ${OS} != "rocky8" ]] && [[ ${OS} != "rhel8" ]]; then
    PACKAGE_LIST+=" kernel-devel-matched-$(uname -r)"
  fi

  if [[ ${OS} == "rocky8" ]] || [[ ${OS} == "rocky9" ]] ; then
    for PACKAGE in ${PACKAGE_LIST}
    do
      yum install -y ${PACKAGE}
      if [ $? -ne 0 ]; then
        # Enable vault repository
        sed -i 's|^#baseurl=http://dl.rockylinux.org/$contentdir|baseurl=http://dl.rockylinux.org/vault/rocky|g' /etc/yum.repos.d/*.repo
        sed -i 's|^#baseurl=https://dl.rockylinux.org/$contentdir|baseurl=https://dl.rockylinux.org/vault/rocky|g' /etc/yum.repos.d/*.repo
        yum install -y ${PACKAGE}
      fi
    done
  else
    for PACKAGE in ${PACKAGE_LIST}
    do
      yum -y install ${PACKAGE}
    done
  fi

  yum install -y yum-plugin-versionlock
  # listing all the packages because wildcard does not work as expected
  yum versionlock kernel kernel-core kernel-modules

  if [[ ${OS} == "rocky8" ]] || [[ ${OS} == "rocky9" ]] ; then
    yum versionlock rocky-release rocky-repos
  elif [[ ${OS} == "rhel8" ]] || [[ ${OS} == "rhel9" ]] ; then
    yum versionlock redhat-release
  fi
else
  apt-get -y install linux-headers-$(uname -r)
  apt-mark hold linux-aws* linux-base* linux-headers* linux-image* 
fi

(this section also shows the same docker yum errors but seems to be ignored)

I don't know how common it is to overwrite /etc/yum/vars/releasever like this but at least for Docker CE it is problematic and Docker CE is a pretty widely used tool.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions