Skip to content

Pin AL2 Version for Linux builds #910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: mainline
Choose a base branch
from

Conversation

ShelbyZ
Copy link
Contributor

@ShelbyZ ShelbyZ commented Mar 18, 2025

Problem:

During new image creation it is possible (but unlikely) that the AL2 version could change between image builds which could lead to published images with version differences.

aws-for-fluent-bit images using AL2:

  • fluent-bit build (release/debug/init)
  • fluent bit plugin build (kinesis/firehose/cloudwatch)
  • fluent-bit container image (all variants)

Fix:

Introduce a new field (al2_version) into linux.version file that takes an AL2 version tag from ECR to be used to pull a specific container image to be used across all image builds. The build process (makefile) can parse the al2_version and pass it as a docker build argument. This value is used to override the previous 2 tag.

Example:

// If not set we use 2
ARG AL2_VERSION=2
FROM public.ecr.aws/amazonlinux/amazonlinux:${AL2_VERSION}

This fix will also allow using Github version tags to rebuild the exact container image version as we can target the exact AL2 version used during container image creation.

Testing:

  • all linux make variants (dev/debug/release)

Image pull with changes:
=> [stage-1 1/18] FROM public.ecr.aws/amazonlinux/amazonlinux:2.0.20250305.0@sha256:ce9ae961378607d8207804db40cb0e117a32f862631c034b6 0.0s

Summary

Issue #, if available:

Testing

make debug succeeded:
Integ tests succeeded:
New tests cover the changes:

Description for the changelog

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Problem:

During new image creation it is possible (but unlikely) that the AL2
version could change between image builds which could lead to published
images with version differences.

aws-for-fluent-bit images using AL2:
- fluent-bit build (release/debug/init)
- fluent bit plugin build (kinesis/firehose/cloudwatch)
- fluent-bit container image (all variants)

Fix:

Introduce a new field (al2_version) into linux.version file that takes
an AL2 version tag from ECR to be used to pull a specific container
image to be used across all image builds. The build process (makefile)
can parse the al2_version and pass it as a docker build argument. This
value is used to override the previous 2 tag.

Example:

// If not set we use 2
ARG AL2_VERSION=2
FROM public.ecr.aws/amazonlinux/amazonlinux:${AL2_VERSION}

This fix will also allow using Github version tags to rebuild the exact
container image version as we can target the exact AL2 version used
during container image creation.

Testing:
- all linux make variants (dev/debug/release)

Image pull with changes:
 => [stage-1  1/18] FROM public.ecr.aws/amazonlinux/amazonlinux:2.0.20250305.0@sha256:ce9ae961378607d8207804db40cb0e117a32f862631c034b6  0.0s
@ShelbyZ ShelbyZ requested a review from a team as a code owner March 18, 2025 17:20
@swapneils
Copy link
Contributor

swapneils commented Mar 19, 2025

There's use in this for when we need to migrate to AL2023, but I'm inclined against pinning to specific versions within the same OS. Would prefer to just set this to 2 in the Linux version profile and remove the changes to generate_changelog.sh.

AL2 is constantly pushing security updates which we can't refuse to ingest, so at any given time the best AL2 version is the latest one; if there are ever eg availability issues from this then our only option is to reallocate to fix them so we can ingest the new version. Given that, pinning to a subversion of AL2 doesn't give us any more control over what version we release, and it increases both the manual effort required for CVE updates and the chance of forgetting to update the pin (risking security issues for customers).

Separately, could you elaborate on the scenario you're seeing where we would have version differences? fluent-bit upstream doesn't use AL2, and the Go plugins are built into the aws-for-fluent-bit images themselves.

@ShelbyZ
Copy link
Contributor Author

ShelbyZ commented Mar 19, 2025

There's use in this for when we need to migrate to AL2023, but I'm inclined against pinning to specific versions within the same OS. Would prefer to just set this to 2 in the Linux version profile and remove the changes to generate_changelog.sh.

Pinning versions give us guarantees that on rebuild we are building the same version of fluent-bit with the same version of AL2 which we currently lack.

AL2 is constantly pushing security updates which we can't refuse to ingest, so at any given time the best AL2 version is the latest one; if there are ever eg availability issues from this then our only option is to reallocate to fix them so we can ingest the new version. Given that, pinning to a subversion of AL2 doesn't give us any more control over what version we release, and it increases both the manual effort required for CVE updates and the chance of forgetting to update the pin (risking security issues for customers).

This change goes in parallel with updating our internal release process for aws-for-fluent-bit container images. Part of that process would be to select/update the al2_version found within linux.version. As a future step we could take aim to automated the update of the al2_version field in the file which removes the human-error factor.

Separately, could you elaborate on the scenario you're seeing where we would have version differences? fluent-bit upstream doesn't use AL2, and the Go plugins are built into the aws-for-fluent-bit images themselves.

This is specifically an "us" problem and we build and package using AL2.

Two scenarios:

  1. [could be a problem] As listed above there are a few images that pull AL2 as part of their build image and our container image for running fluent-bit uses AL2 as well. As each step pulls against 2 tag there is possibility that a new version is published between any of the build steps which can lead to differences between images. If we did have a customer issue we would remedy by doing a full rebuild and new publish.
  2. [not entirely important] If we needed to rebuild a specific aws-for-fluent-bit container image we need load the existing image and determine the AL2 version and override that value to pull the exact version used for that version build. This is due to pinning to the 2 tag and not a specific version.

@swapneils
Copy link
Contributor

swapneils commented Mar 19, 2025

Pinning versions give us guarantees that on rebuild we are building the same version of fluent-bit with the same version of AL2 which we currently lack.

Can you elaborate on how having this guarantee improves our ops or availability, given that we put images through our test pipeline before releasing them?

This change goes in parallel with updating our internal release process for aws-for-fluent-bit container images. Part of that process would be to select/update the al2_version found within linux.version. As a future step we could take aim to automated the update of the al2_version field in the file which removes the human-error factor.

+1, if we merge this change recommend doing so after #912 so this PR can add the required changes to publish_cve_update.py's linux.version updating and check_for_new_al_version.sh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants