Skip to content

Conversation

@gargipanatula
Copy link
Contributor

@gargipanatula gargipanatula commented Jun 18, 2025

What type of PR is this?

Uncomment only one, leave it on its own line:

/kind api-change
/kind bug

/kind cleanup

/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
Continues work on upgrading repo to AWS SDK Go V2. (PR 1: #1146, PR 2: #1157)

Notable Changes:
The IMDS client's signing name and signing region can no longer be overridden. This is because IMDS, unlike other SDK clients, doesn't support signing. (Standard SDK clients use SigV4 while IMDS uses a different protocol).

Trivial Changes:

  1. Upgrades usage of the AWS SDK for Go KMS and IMDS packages to V2.
  2. Combines Ec2V2 client with the EC2 client returned by Compute()
  3. Removed assumeRoleProvider since this was only used by SDK Go V1 clients
  4. Added to e2e SDK client testing in aws_sdk_test.go. Now checks that the signing region & name, not only the URL, were properly overridden.
  5. Removed lots of unused code that was used for SDK V1 implementations, and updated naming for methods/fields that were distinguished as "V2".

Testing:
Ran unit tests with make test and e2e tests with the following commands:

export AWS_REGION=us-west-2
export TEST_PATH=./tests/e2e/...
export GINKGO_NODES=4
export GINKGO_FOCUS=[cloud-provider-aws-e2e]
export GINGKO_SKIP=[Disruptive]
go install github.com/onsi/ginkgo/[email protected]
export PATH=$PATH:$HOME/go/bin

pushd ./tests/e2e
ginkgo . -v -p --nodes=$GINKGO_NODES --focus=$GINKGO_FOCUS --skip=$GINKGO_SKIP

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

The Metadata client's signing name and signing region can no longer be overridden. 

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 18, 2025
@k8s-ci-robot k8s-ci-robot requested a review from nckturner June 18, 2025 21:50
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 18, 2025
@k8s-ci-robot k8s-ci-robot requested a review from olemarkus June 18, 2025 21:50
@k8s-ci-robot k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 18, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @gargipanatula. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 18, 2025
signingRegion, signingMethod string
signingName string
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This test used to test both ValidateOverride and the endpoint/signingname/signingregion overriding functionality. Now, it only tests ValidateOverride, and checking the overrides themselves has been moved to aws_sdk_test.go since override logic is configured directly with the clients.

@kmala
Copy link
Member

kmala commented Jun 19, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 19, 2025
@gargipanatula gargipanatula marked this pull request as ready for review June 19, 2025 01:37
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 19, 2025
@k8s-ci-robot k8s-ci-robot requested a review from kmala June 19, 2025 01:37
"fmt"
"strings"

awsv2 "github.com/aws/aws-sdk-go-v2/aws"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i see that the aws is only use for converting pointer to values and vice versa, can we remove that usage and such that we need not use awsv2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, but I think I'll punt this to my final PR in this migration, which will be solely focused on upgrading the remaining packages (just github.com/aws/aws-sdk-go/aws and github.com/aws/aws-sdk-go/aws/awserr). Wanted to keep this PR as simple as possible for ease of review.

return results, nil
}

func (s *awsSdkEC2) DescribeInstanceTopology(ctx context.Context, input *ec2.DescribeInstanceTopologyInput, optFns ...func(*ec2.Options)) ([]ec2types.InstanceTopology, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we are moving the functionality of the https://github.com/kubernetes/cloud-provider-aws/blob/master/pkg/services/aws_ec2.go to here, can we deprecate that file or is it used somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch - these have indeed been moved over so we can deprecate these files.

instance, err := c.getInstanceByID(ctx, instanceID)
instanceIDBytes, err := io.ReadAll(instanceIDMetadata.Content)
if err != nil {
panic("unable to parse instance id")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why panic here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, this should probably be an error to match the rest of the code. Will change this in the next commit.

p.addAPILoggingMiddleware(&cfg)
imdsClient := imds.New(imds.Options{ClientEnableState: imds.ClientEnabled})
getInstanceIdentityDocumentOutput, err := imdsClient.GetInstanceIdentityDocument(context.Background(), &imds.GetInstanceIdentityDocumentInput{})
identity := getInstanceIdentityDocumentOutput.InstanceIdentityDocument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will getInstanceIdentityDocumentOutput have any value if there is error ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch - we should check if err != nil here and return an error, because we get an error if the request fails/is unable to be parsed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this line identity := getInstanceIdentityDocumentOutput.InstanceIdentityDocument should be moved under err==nil


func (p *awsSDKProvider) Metadata() (config.EC2Metadata, error) {
sess, err := session.NewSession(&aws.Config{
EndpointResolver: p.cfg.GetResolver(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be migrated right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch - migrated endpoint resolution. IMDS doesn't support signing, so its resolver will only override the URL. Standard SDK clients use SigV4 while IMDS uses a different protocol.

sess, err := session.NewSession(&aws.Config{
EndpointResolver: p.cfg.GetResolver(),
})
cfg, err := awsConfig.LoadDefaultConfig(context.TODO())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use the default mode whenever possible

p.addAPILoggingMiddleware(&cfg)
imdsClient := imds.New(imds.Options{ClientEnableState: imds.ClientEnabled})
getInstanceIdentityDocumentOutput, err := imdsClient.GetInstanceIdentityDocument(context.Background(), &imds.GetInstanceIdentityDocumentInput{})
identity := getInstanceIdentityDocumentOutput.InstanceIdentityDocument
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this line identity := getInstanceIdentityDocumentOutput.InstanceIdentityDocument should be moved under err==nil

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jun 22, 2025
func (cfg *CloudConfig) GetIMDSEndpointOpts() []func(*imds.Options) {
opts := []func(*imds.Options){}
for _, override := range cfg.ServiceOverride {
if override.Service == imds.ServiceID {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Instead of checking the ServiceID and region here, like we do for other clients, we only check ServiceID. This does not result in a behavioral change - the resolver for the SDK V1 ec2metadata package always had an empty value for the region at the time of the request, so only the serviceID field was used for determining whether to override.

Also, we are not overriding the signing name and region here since IMDS does not support signing.

}
macsBytes, err := io.ReadAll(macsMetadata.Content)
if err != nil {
panic("unable to parse macs")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we should panic here?

// But IMDS uses a different request pattern: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html
var opts []func(*imds.Options) = p.cfg.GetIMDSEndpointOpts()
opts = append(opts, func(o *imds.Options) {
o.ClientEnableState = imds.ClientEnabled // enable requests, otherwise the AWS_EC2_METADATA_DISABLED env var will need to be set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my understanding what is the difference between default enabled and enabled https://github.com/aws/aws-sdk-go-v2/blob/feature/ec2/imds/v1.16.32/feature/ec2/imds/api_client.go#L37 as that is the default value ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not configure this variable, it is set to defaultEnable and falls back to whatever the AWS_EC2_METADATA_DISABLED env var is set to.

If we do configure it, then this overrides the value of AWS_EC2_METADATA_DISABLED and acts as if AWS_EC2_METADATA_DISABLED = false.

However, I dug into the old ec2metadata code and it essentially behaves the same way (it needs the same env var to be set to make successful requests). So, I'm going to remove this line so we don't override AWS_EC2_METADATA_DISABLED with o.ClientEnableState = imds.ClientEnabled and we can maintain parity with the old code. Thanks for the callout!

Info about AWS_EC2_METADATA_DISABLED from the docs:

AWS_EC2_METADATA_DISABLED - environment variable
Whether or not to attempt to use Amazon EC2 Instance Metadata Service (IMDS) to obtain credentials.

Default value: false.

Valid values:

true – Do not use IMDS to obtain credentials.

false – Use IMDS to obtain credentials.

}
macsBytes, err := io.ReadAll(macsMetadata.Content)
if err != nil {
panic("unable to parse macs")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there shouldn't be panic here right?

t.Errorf("Should succeed for case: %s, got %v", test.name, err)
}

if len(cfg.ServiceOverride) != len(test.servicesOverridden) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this removed? can't we test some if not all?

Copy link
Contributor Author

@gargipanatula gargipanatula Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the test was checking that the signing fields and URL were properly overridden using the generic resolver from V1. Since we now have resolvers on a per client basis, we are now testing the resolvers on a more e2e scale, creating a client and making mock requests (all in aws_sdk_test.go). We're still testing the override functionality, just in a different place.

This test used to test both ValidateOverride() and overriding, now it just tests ValidateOverride().

Let me know if I'm missing something here, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking if we can keep the check len(cfg.ServiceOverride) != len(test.servicesOverridden) but i think it should be okay as this should be tested indirectly in the other function.

@gargipanatula gargipanatula force-pushed the aws-sdk-go-client-bump branch from 21f4dd2 to 23058a7 Compare June 23, 2025 18:29
@gargipanatula
Copy link
Contributor Author

/retest

@kmala
Copy link
Member

kmala commented Jun 23, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kmala

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 23, 2025
@k8s-ci-robot k8s-ci-robot merged commit 561fe08 into kubernetes:master Jun 23, 2025
11 checks passed
k8s-ci-robot added a commit that referenced this pull request Jun 30, 2025
…#1146-#1157-#1169-#1177-upstream-release-1.33

Automated cherry pick of #1146: upgraded to ec2 v2, buggy
#1157: Update ELB and ELBV2 packages to AWS SDK Go V2
#1169: aws sdk go upgrade
#1177: update aws & awserr to go sdk v2
k8s-ci-robot added a commit that referenced this pull request Jul 3, 2025
…#1146-#1157-#1169-#1177-upstream-release-1.32

Automated cherry pick of #1146: upgraded to ec2 v2, buggy
#1157: Update ELB and ELBV2 packages to AWS SDK Go V2
#1169: aws sdk go upgrade
#1177: update aws & awserr to go sdk v2
k8s-ci-robot added a commit that referenced this pull request Oct 10, 2025
#1157-#1169-#1177-upstream-release-1.30

Automated cherry pick of #1146: upgraded to ec2 v2, buggy #1157: Update ELB and ELBV2 packages to AWS SDK Go V2 #1169: aws sdk go upgrade #1177: update aws & awserr to go sdk v2
k8s-ci-robot added a commit that referenced this pull request Oct 10, 2025
#1157-#1169-#1177-upstream-release-1.31

Automated cherry pick of #1146: upgraded to ec2 v2, buggy #1157: Update ELB and ELBV2 packages to AWS SDK Go V2 #1169: aws sdk go upgrade #1177: update aws & awserr to go sdk v2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants