-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-1645: define dual stack policies and fields #5264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
MrFreezeex
commented
Apr 29, 2025
- One-line PR description: Define dual stack recommendations and fields
- Issue link: Multi-Cluster Services API #1645
- Other comments: This does three things: define initial suggestion as to what an implementation may do to support dual stack services, fix the max items for the IPs field (which is already fixed in the actual CRD) and add a IPfamilies matching the same field in the Service that implementation may use to reconcile this globally with an implementation defined policy.
e8481cf
to
092d90e
Compare
ipFamilies: | ||
- IPv4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, IPV4 and IPV6 format are quite different, are the formats in the "ips" field alone not enough for the consumer to discern? I assume that the consumer knows what IP family it can support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is more aimed toward the allocation of the IPs
than what happens after the IPs
are already actually allocated a bit like the current type
field or even how the ipFamilies
field behave on a regular Service.
For instance for Cilium we would most likely want to do an intersection of all the ipFamilies
in the exported Services which would put this as relevant as the other fields to "allocate" the IPs
(meaning to create the derived service as it how we do this) to us essentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, IIRC, this is for some controller to act on this field? Are we not in the process of moving the serviceImport to either status or root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure what you mean but yes our controller will use this to create the derived service with the appropriate IPFamily
Triage note: Had some discussion with comments from @mikemorris last SIG-MC, can you please add them to this PR so we can talk about them? |
Hi @mikemorris are you still looking at commenting here about your concerns about this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to capture a high-level concern I have with this direction:
- Using a Service is just one possible (albeit common) implementation of MCS - alternative explorations such as ClusterIP Gateways are currently being developed and may be an option in the future . Service is a bloated resource and in general I would have a strong preference towards avoiding leaking what should be implementation details up into the actual spec resources.
- I don't think adding this field to ServiceImport should actually be necessary.
- On Service, it is optionally configured by a service owner in conjunction with
.spec.ipFamilyPolicy
to specify what IP family or familiies should be made available for a Service on a dual-stack cluster, and which family should be used for the legacy.spec.clusterIP
field. - In contrast to Service, on a ServiceImport no discretion or decision-making should be required - the available IP families will be determined by the IP families of Services exposed by ServiceExports, and may be constrained by the IP family stack configuration of the importing cluster. The exported Services may be on clusters with different dual-stack configurations (some IPv4 only, some IPv6, some dual-stack) and may have different configurations for which IPs are available for each Service in each cluster. I believe determining the appropriate dual-stack configuration should be possible by watching the associated exported Service resources (and their EndpointSlices) directly, from either a centralized controller or per-cluster decentralized API server watches.
- On Service, it is optionally configured by a service owner in conjunction with
- I don't see this field as being helpful for order-of-operations concerns in creating an implementation Service resource, because at any point the available endpoints for a ServiceImport may change (an IPv6-only cluster may go offline while an IPv4-only cluster remains available and the ServiceImport in a dual-stack cluster should likely be updated to drop its IPv6 address from the
ips
field if the topology of an implementation requires direct/flat networking and no IPv6 endpoints are available (cleaning up and removing the ServiceImport entirely if no backends are routable from the importing cluster is a viable alternative too). Similarly, if an exported Service on a new IP family becomes available when it wasn't originally, the ServiceImport should likely be updated to publish an address in theips
field for the newly-available family when adding the backends. - I think what may be helpful instead is clarifying expected behavior in the various scenarios @MrFreezeex had laid out in the presentation, and possibly encoding those in conformance tests and/or a
status
field indicating that a ServiceImport is "ready" (has backends available which are reachable from the importing cluster (which may have different constraints or meaning in centralized vs decentralized implementations) rather than expecting the ServiceImport and any supporting infra to be created (and destroyed) syncronously and be routable immediately at creation.
Thanks! Quickly answering to those point but we can have a longer discussion in the sig meeting if you are available there
IIUC all the implementation works with a Service in a way or in another, we cannot block PR to ServiceImport by saying that hypothetically another alternative to Service which is very much experimental and that (AFAIK) doesn't even have consensus among sig network/Gateway-API folks is the way forward. If we were to do that we can probably also forget about bumping MCS-API to v1beta1 too for instance...
In Cilium at least we do not have this info on the controller creating the derived Service. This controller is intentionally not connected to all the other clusters it only knows about the local ServiceImport and the (possibly not yet created) derived Service. We also do not always sync Endpoint Slices from remote clusters as an optimization (only if there is a specific annotation or that the ServiceImport is headless). So I need to get this info already merged from all clusters/"reconciled" on the ServiceImport resource directly.
Yep the ServiceImport
I was more planning to to do this in a second step/PR as while the initial use case might be tied to that PR some possible conditions on the ServiceImport might be relevant for other things (like reporting any errors related to the import 🤷♂️). I am not sure that besides checking that the ServiceImport would be ready we can do much more on the conformance tests though. I am not entirely sure what you mean about the behavior of "ready" and " to be created (and destroyed) syncronously and be routable immediately at creation" to me it would be more a place to put any error state whether it's something very generic with ready or some more specific implementation defined errors if there's a need for that. |
Okay this is the implementation detail/constraint I had not been familiar with, will need to think about this more. |
Triage notes:
|
Submariner could also make use of the |
keps/sig-multicluster/1645-multi-cluster-services-api/README.md
Outdated
Show resolved
Hide resolved
092d90e
to
1ece9e4
Compare
keps/sig-multicluster/1645-multi-cluster-services-api/README.md
Outdated
Show resolved
Hide resolved
5c6aff9
to
257df3b
Compare
keps/sig-multicluster/1645-multi-cluster-services-api/README.md
Outdated
Show resolved
Hide resolved
257df3b
to
0afee16
Compare
/lgtm |
There's a handful of details/nuance to consider, but I think I'm largely convinced it makes sense to add
|
Triage notes:
|
0afee16
to
6a19fc3
Compare
keps/sig-multicluster/1645-multi-cluster-services-api/README.md
Outdated
Show resolved
Hide resolved
Thanks for the reviews @mikemorris!
I tried to reference that more in my latest version let me know if this looks better now 👀
As I was saying a bit in the call we should probably keep this implementation defined, for instance as the result of merging IPFamilies you might want to change the order of what's in IPFamilies field and I am not sure we should prevent that. For a derived Service style implementation you could just swap the orders IPs on the ServiceImport while your derived Service stays as is or even (not specific to derived service implementation) just express that the result of the merging across your clusters should be in this order but you don't want to change IPs order so you keep it as is.
For us in Cilium we will most certainly do that although some implementation might like an union too or whatever the local cluster support because there is some magic gateway that does intelligent thing to not care about IP protocol somehow. A while ago we were saying in a sig-mc meeting that we shouldn't dictate that in the KEP (and I recognize that you are not saying here that we should, I am just saying that out loud).
Yes I haven't implemented that yet but I think what we would most likely do in CIlium is:
EDIT: For this my latest thinking about this is that we would most likely add an annotation on the derived Service on our side to mark the "real IP protocol" that we consider which means we could remove some protocol in this annotation while not necessarily removing it on the true IPFamilies field. This is again implementation defined territory and just to mention what Cilium might do with all of this!
Yep this would be supported in our case, I am not sure of this yet but maybe what we would end up doing is:
|
6a19fc3
to
efacec5
Compare
/lgtm |
Triage notes:
|
/lgtm Appreciate the thoughtfulness in considering all those scenarios/edge cases @MrFreezeex, I think with that context this is in good shape to merge. |
IPs []string `json:"ips,omitempty"` | ||
// +optional | ||
IPFamilies []corev1.IPFamily `json:"ipFamilies,omitempty"` | ||
// +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not following this conversation so apologies if this was already discussed, but this was very challenging to implement in Kube, cc @thockin as he did all the work, because IPs
and IPFamilies
are related, and the defaulting and validation or updates got very tricky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case this field is not directly controlled to by a user so there's less validation logic needed at the API level I believe (everything in ServiceImport is machine controlled and usually mirror most fields from Service). It's mostly implementation defined on how they are using exactly this. This PR is mostly defining the field (which is "internally" needed for Cilium and Submariner) and pointing to the Service doc to make sure implementations are aware of what IPFamilies does in regular Services.
Like for instance in Cilium we would want to do an intersection of all linked Service IPFamilies exported (user controlled) and reconcile this with our controller in the ServiceImport resources, then this ServiceImport (for us; using derived Service is also implementation defined) would create a derived service with the same IPFamilies field (+ some tricks to not have to recreate the derived Service if the IPFamilies change). There is a handful of viable things that could be done here, a union of IPFamilies for instance or just whatever the local cluster support/is configured for, which mainly explains that we want to not be restrictive of how this is getting defined...
Hope this helps!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the goal is to mirror the services fields why don't you add a reference? is to avoid the indirection layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all fields are relevant to ServiceImport either for instance MCS-API type
field is different and there's no targetPort
in the ports arrays.
The rest of the fields is mostly not needed for MCS, although there is the trafficDistribution
field that we haven't still added that would be the main remaining one probably. But besides that I am not aware of other things from Service that we should add here 🤔.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another factor is that ServiceImport
is a distributed resource and referencing a resource in another cluster would be problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I warned you 😄 , now you'll not be able to unsee it 😅 https://github.com/kubernetes/kubernetes/blob/d1af46054f6f0e9a0ac2fac5ef29184a315b2a41/pkg/apis/core/validation/validation.go#L8656-L8905
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the end in Kubernetes APIs we settle on:
- it is single stack IPv4 OR IPv6 (only one IP obviously)
- it is dual stack IPv4IPv6 or IPv6IPv4 (order matters) (only two IPs , one of each IP family)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see I will try to add some sentence to make it explicit that implementation should set this coherently at least
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rebased the PR so I now realize it's hard to find the diff but I added this sentence:
If `ipFamilies` is set on the ServiceImport object, it must not have duplicated
families (for instance `ipFamilies: [IPv4, IPv4]` is not valid) and the IPs
should eventually be in the same order as what is defined in `ipFamilies`.
And adding a MaxItems=2 on IPFamilies
With that it should mandate implementations to set this correctly :D.
Thanks for providing the context about how this is done for the Service!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aojea 👋, just wondering does the above looks good (or at least reasonable) to you? I am hoping to get this in a good shape for the sig-mc leads to have a look at next week meeting (30/09) to get this merged, would be awesome if you have time to confirm that by that time 🙏. If you have other suggestion(s) I would be ofc more than happy to follow up on those!
efacec5
to
b1a4e33
Compare
New changes are detected. LGTM label has been removed. |
Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>
b1a4e33
to
15e4dbd
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MrFreezeex, skitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |