-
Notifications
You must be signed in to change notification settings - Fork 629
✨ IPv6 support for self-managed clusters #5603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @tthvo! |
Hi @tthvo. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cc @nrb @sadasu @patrickdillon I am not yet sure what to do with e2e tests or if there are any existing ones for IPv6 clusters...I leave it as an pending TODO. |
@tthvo: GitHub didn't allow me to request PR reviews from the following users: sadasu, patrickdillon. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
A quick preview of |
/ok-to-test |
/assign @mtulio Asking you for a review Marco as I know you have been working on this downstream |
} | ||
} | ||
|
||
func (s *Service) getNat64PrivateRoute(natGatewayID string) *ec2.CreateRouteInput { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: I don't know what makes this a private route. I see getNatGatewayPrivateRoute()
also implemented similarly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the "private" here means that the route is added to the route table associated with a private subnet.
Now that I think about it more, this route should be added for both private and public subnets (i.e. public subnet only install) to allow IPv6-only workloads in those subnets to reach IPv4-only internet services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it more, this route should be added for both private and public subnets (i.e. public subnet only install)
Sorry, I wanted to correct myself 😓 In public-subnet-only install, there are no NATs being created. Thus, we should not add a route to NAT for NAT64.
In order words, this NAT64 is only applicable for private subnets as it is currently.
deeb032
to
437f428
Compare
/retest-required |
437f428
to
43d6ec8
Compare
Status: As of Sep 29, 2025, the only tested scenarios in the PR are:
Todo: I am currently exploring options/changes to support dualstack-ipv4-primary and also testing a couple more scenarios (e.g. dualstack public subnets and ipv4-only subnets, edge subnets). Will update once done 👀 |
(AWSCluster resource)
AWS requires that when registering targets by instance ID for an IPv6 target group, the targets must have an assigned primary IPv6 address. Note: The default subnets managed by CAPA are already set up to assign IPv6 addresses to newly created ENIs.
The httpProtocolIPv6 field enables or disables the IPv6 endpoint of the instance metadata service. The SDK only applies this field if httpEndpoint is enabled. When running on single-stack IPv6, pods only have IPv6, thus requiring an IPv6 endpoint to query IMDS as IPv4 network is unreachable.
In the case where egress-only-internet-gateway is deleted, CAPA reconcilation loop will create a new one. Thus, CAPA needs to modify the routes to point to the new eigw ID.
This allows IPv6-only workloads to reach IPv4-only services. AWS supports this via NAT64/DNS64. More details: https://docs.aws.amazon.com/vpc/latest/userguide/nat-gateway-nat64-dns64.html
The API for DescribeEgressOnlyInternetGateways does not support attachment.vpc-id filter. Thus, the call will return all available eigw. Consequences: - CAPA incorrectly selects an unintended eigw for use. Leading to route creation failure since the eigw belongs to a different VPC. - CAPA incorrectly destroys all eigw of all VPCs. This is very catastrophic as it can break other workloads. This commit changes the filter to use cluster tag instead. Additional safeguard is also included to check if the eigw is truly attached the VPC.
CAPA handles icmpv6 as a protocol number 58. AWS accepts protocol number when creating rules. However, describing a rule from AWS API returns the protocol name, thus causing CAPA to not recognize it and fail.
…ices For IPv4, we have field NodePortIngressRuleCidrBlocks that specifies the allowed source IPv4 CIDR for node NodePort services on port 30000-32767. This extends that field to also accept IPv6 source CIDRs.
We need an option to configure IPv6 source CIDRs for SSH ingress rule of the bastion host. This extends the field allowedCIDRBlocks to also accepts IPv6 CIDR blocks.
When creating a bastion host for an IPv6 cluster, the instance has both public IPv4 and IPv6. Thus, we need to report them in the cluster status if any. This also adds an additional print column to display that bastion IPv6.
This is a minimal template set to install an IPv6-enabled cluster. Both the controlplane and worker nodes must use nitro-based instance type (with IPv6 support).
This is a set of customized calico CNI manifests to support single-stack IPv6 cluster. Note that VXLAN is used since IP-in-IP currently only supports IPv4. References: - https://docs.tigera.io/calico/latest/networking/ipam/ipv6#ipv6 - https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/config-options#switching-from-ip-in-ip-to-vxlan - https://docs.tigera.io/calico/latest/networking/configuring/vxlan-ipip
This combines existing docs for IPv6 EKS clusters with non-EKS ones, and also properly register the topic page into the documentation TOC.
Validation for specified VPC and subnet CIDRs is added for early feedback from the webhook. There are already existing checks for bastion and nodePort CIDRs.
The following is added: - [BYO VPC] Mention the required route when enabling DNS64. - [BYO VPC] Mention that CAPA only utilizes the IPv6 aspect of the dual stack VPC.
There is a brief period where the IPv6 CIDR is not yet associated with the subnets. Thus, when CAPA creates the default dualstack subnets, it should wait until the IPv6 CIDR is associated before proceeding. If not, CAPA will misinterprete the subnet as non-IPv6 and proceed its reconcilation. The consequence is that CAPA will skip creating a route to eigw. Route to eigw for destination "::/0" to eigw is required for EC2 instance time sync on start-up.
…ined When AWSCluster.spec.network.vpc.ipv6 is non-nil, most handlers in CAPA treats it as "adding" IPv6 capabilities on top of IPv4 infrastructure. Except security group ingress rules for API LB. This commit aligns the API LB SG handler with the rest of the code base. These rules can be overriden in the AWSCluster LB spec to allow only IPv6 CIDRs if needed.
The field isIpv6 is set to true if and only if the subnet has an associated IPv6 CIDR. This means the VPC is also associated with an IPv6 CIDR.
The field targetGroupIPType is added to the loadbalancer spec to allow configuring ip address type of target group for API load balancers. This field is not applicable to Classic Load Balancers (CLB). This commit also defines a new network status field to determine the ip type of API load balancers.
What type of PR is this?
/kind feature
What this PR does / why we need it:
As of today, CAPA supports IPv6 on EKS, but not self-managed clusters. Thus, these changes bring IPv6 support for self-managed clusters, specifically single-stack IPv6 (no dualstack support yet)
Which issue(s) this PR fixes:
Fixes #2420 (part 2 for self-managed cluster, part 1 covers EKS)
Special notes for your reviewer:
test/e2e/data/cni/calico_ipv6.yaml
. Calico does not support IPv6 with "IP-in-IP" so we need to use VXLAN.Checklist:
Release note: