-
Notifications
You must be signed in to change notification settings - Fork 256
fix: fixing Stateless CNI delete in SwiftV2 scenario #3967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes issues with Stateless CNI delete operations in SwiftV2 scenarios by modifying endpoint ID generation and improving the delete flow. The changes ensure proper distinction between different NIC types and provide necessary context for transparent client operations.
- Modifies
GetEndpointID
to accept aNICType
parameter and append interface name for delegated NICs - Updates delete flow to use proper network manager clients and adds
NetNsPath
to state information - Adds
NetworkNameSpace
field to CNS REST server structures for frontend NIC support
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
network/manager.go | Core logic changes for endpoint ID generation and delete flow improvements |
network/manager_mock.go | Mock implementation updated to match new GetEndpointID signature |
cns/restserver/restserver.go | Added NetworkNameSpace field to IPInfo struct |
cns/restserver/ipam.go | Updated validation and state management for NetworkNameSpace field |
cni/network/network.go | Updated callers to pass NICType parameter to GetEndpointID |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
@@ -514,7 +514,7 @@ func (nm *networkManager) DeleteEndpointState(networkID string, epInfo *Endpoint | |||
nw := &network{ | |||
Id: networkID, // currently unused in stateless cni | |||
HnsId: epInfo.HNSNetworkID, | |||
Mode: opModeTransparentVlan, | |||
Mode: opModeTransparent, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from opModeTransparentVlan
to opModeTransparent
appears unrelated to the stated SwiftV2 fixes. This mode change could have significant side effects and should be explained or verified as intentional.
Mode: opModeTransparent, | |
Mode: opModeTransparentVlan, |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Transparent mode beong used by statefull CNI for SWiftV1 and V2 and stateless CNI should follow the same. TransparentVlan which is the original value seems to be a mistake
} | ||
|
||
// verifyUpdateEndpointStateRequest verify the CNI request body for the UpdateENdpointState API | ||
func verifyUpdateEndpointStateRequest(req map[string]*IPInfo) error { | ||
for ifName, InterfaceInfo := range req { | ||
if InterfaceInfo.HostVethName == "" && InterfaceInfo.HnsEndpointID == "" && InterfaceInfo.NICType == "" && InterfaceInfo.MacAddress == "" { | ||
if InterfaceInfo.HostVethName == "" && InterfaceInfo.HnsEndpointID == "" && InterfaceInfo.NICType == "" && InterfaceInfo.MacAddress == "" && InterfaceInfo.NetworkNameSpace == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition InterfaceInfo.NICType == ""
is incorrect. NICType is of type cns.NICType
, not string, so this comparison will always be false. It should be compared against the zero value of the NICType enum or use a different validation approach.
if InterfaceInfo.HostVethName == "" && InterfaceInfo.HnsEndpointID == "" && InterfaceInfo.NICType == "" && InterfaceInfo.MacAddress == "" && InterfaceInfo.NetworkNameSpace == "" { | |
if InterfaceInfo.HostVethName == "" && InterfaceInfo.HnsEndpointID == "" && InterfaceInfo.NICType == cns.NICType(0) && InterfaceInfo.MacAddress == "" && InterfaceInfo.NetworkNameSpace == "" { |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behzad-mir can you fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is actually correct as NICTYpe is not an enum and it is indeed string!
@@ -1313,12 +1313,16 @@ func updateIPInfoMap(iPInfo map[string]*IPInfo, interfaceInfo *IPInfo, ifName, e | |||
iPInfo[ifName].MacAddress = interfaceInfo.MacAddress | |||
logger.Printf("[updateEndpoint] update the endpoint %s with MacAddress %s", endpointID, interfaceInfo.MacAddress) | |||
} | |||
if interfaceInfo.NetworkNameSpace != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We protect against empty string, what about " " ? Is that a possible condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the pod is no longer present? I don't know how the netns path is being used here. Do we check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we write to the statefile so we can store the pod information when a delete call get issued. We are just checking for empty trying to avoid writing empty to the field in statefile. We read NetNs from whatever CNI produces so I don't think it can be " ".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In swiftV2 when a delete call happens the Frontened Nic needs the netNspath. we have a secondry_transparentClient tha moves interface from the Nettwork Namespace.
This is already being used by Stateful CNI and we are just cirrecting it for stateless CNI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the issue if we write empty namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to make the statefile neater. There is no reason to add a field when it is empty.
@@ -1313,12 +1313,16 @@ func updateIPInfoMap(iPInfo map[string]*IPInfo, interfaceInfo *IPInfo, ifName, e | |||
iPInfo[ifName].MacAddress = interfaceInfo.MacAddress | |||
logger.Printf("[updateEndpoint] update the endpoint %s with MacAddress %s", endpointID, interfaceInfo.MacAddress) | |||
} | |||
if interfaceInfo.NetworkNameSpace != "" { | |||
iPInfo[ifName].NetworkNameSpace = interfaceInfo.NetworkNameSpace | |||
logger.Printf("[updateEndpoint] update the endpoint %s with NetworkNameSpace %s", endpointID, interfaceInfo.NetworkNameSpace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement has no level, debug, info, warning, error? I see this is what is currently done above, this makes troubleshooting hard does it not? We cant filter on the level. This should be a tech debt item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct. We need to add levels for all of these logging here. Will address some of them in the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont we have logger.Info or logger.Error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for old CNS logegr which does not have a Info and instead it has printf.
All the logs in IPAM and other part of CNS code needs to be moved to the loggerV2 which has the proper logging levels. I used the oldlogger to have the PR be consistent with the rest of the file.
network/manager.go
Outdated
@@ -115,7 +115,7 @@ type NetworkManager interface { | |||
DetachEndpoint(networkID string, endpointID string) error | |||
UpdateEndpoint(networkID string, existingEpInfo *EndpointInfo, targetEpInfo *EndpointInfo) error | |||
GetNumberOfEndpoints(ifName string, networkID string) int | |||
GetEndpointID(containerID, ifName string) string | |||
GetEndpointID(containerID, ifName string, nicType cns.NICType) string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we don't. I am planning to add a subsequent (sister) PR to this that add UTs for some of these funcs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a lot of context on the need for these changes. The changes look fine, but we should improve the PR a bit -
- Update the description to cover why the opMode change
- Address other copilot comments (one about NICType in particular seems serious)
- Add a description of what validation steps have been carried out. Include screenshots/logs if necessary.
- Add tests.
@@ -1313,12 +1313,16 @@ func updateIPInfoMap(iPInfo map[string]*IPInfo, interfaceInfo *IPInfo, ifName, e | |||
iPInfo[ifName].MacAddress = interfaceInfo.MacAddress | |||
logger.Printf("[updateEndpoint] update the endpoint %s with MacAddress %s", endpointID, interfaceInfo.MacAddress) | |||
} | |||
if interfaceInfo.NetworkNameSpace != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the issue if we write empty namespace?
@@ -1313,12 +1313,16 @@ func updateIPInfoMap(iPInfo map[string]*IPInfo, interfaceInfo *IPInfo, ifName, e | |||
iPInfo[ifName].MacAddress = interfaceInfo.MacAddress | |||
logger.Printf("[updateEndpoint] update the endpoint %s with MacAddress %s", endpointID, interfaceInfo.MacAddress) | |||
} | |||
if interfaceInfo.NetworkNameSpace != "" { | |||
iPInfo[ifName].NetworkNameSpace = interfaceInfo.NetworkNameSpace | |||
logger.Printf("[updateEndpoint] update the endpoint %s with NetworkNameSpace %s", endpointID, interfaceInfo.NetworkNameSpace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont we have logger.Info or logger.Error?
} | ||
|
||
// verifyUpdateEndpointStateRequest verify the CNI request body for the UpdateENdpointState API | ||
func verifyUpdateEndpointStateRequest(req map[string]*IPInfo) error { | ||
for ifName, InterfaceInfo := range req { | ||
if InterfaceInfo.HostVethName == "" && InterfaceInfo.HnsEndpointID == "" && InterfaceInfo.NICType == "" && InterfaceInfo.MacAddress == "" { | ||
if InterfaceInfo.HostVethName == "" && InterfaceInfo.HnsEndpointID == "" && InterfaceInfo.NICType == "" && InterfaceInfo.MacAddress == "" && InterfaceInfo.NetworkNameSpace == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behzad-mir can you fix this?
network/manager.go
Outdated
@@ -115,7 +115,7 @@ type NetworkManager interface { | |||
DetachEndpoint(networkID string, endpointID string) error | |||
UpdateEndpoint(networkID string, existingEpInfo *EndpointInfo, targetEpInfo *EndpointInfo) error | |||
GetNumberOfEndpoints(ifName string, networkID string) int | |||
GetEndpointID(containerID, ifName string) string | |||
GetEndpointID(containerID, ifName string, nicType cns.NICType) string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of changing existing api, can we define new api GetEndpointIDByNicType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the new commit
@@ -514,7 +514,7 @@ func (nm *networkManager) DeleteEndpointState(networkID string, epInfo *Endpoint | |||
nw := &network{ | |||
Id: networkID, // currently unused in stateless cni | |||
HnsId: epInfo.HNSNetworkID, | |||
Mode: opModeTransparentVlan, | |||
Mode: opModeTransparent, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this a bug in previous PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. So for stateless CNI in Windows this did not matter. For Linux the delete was actually executed fine since this was not multitenancy. It did remove the Veth interface instead of ip route since it will fall into bridge mode (default).
cni/network/network.go
Outdated
*opt.infraSeen = true | ||
} else { | ||
ifName = "eth" + strconv.Itoa(opt.endpointIndex) | ||
endpointID = plugin.nm.GetEndpointID(opt.args.ContainerID, ifName) | ||
endpointID = plugin.nm.GetEndpointID(opt.args.ContainerID, ifName, opt.ifInfo.NICType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what scenarios was this enabled in beforehand/currently?
Windows swift v1 (single nic)? y/n
Windows swift v2 (multi nic)? y/n
Linux swift v1 (single nic)? y/n
Linux swift v2 (multi nic)? y/n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been enabled in all swiftV2 multi nic scenarios and also singularity.
In the new commit a new API defined GetEndpointIDByNicType to avoid touching the previous one and also it only make changes to the containerID in statelessCNI DElegatedNIC case.
HostVethName string `json:",omitempty"` | ||
MacAddress string `json:",omitempty"` | ||
NICType cns.NICType `json:",omitempty"` | ||
NetworkNameSpace string `json:",omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm, we are adding NetworkNameSpace to the cns and ip info because this information is required during linux endpoint deletion, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's correct it's needed for fornetend nic removal form namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wont namepsace be provided by containerd during cni delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I willcheck that. But currently in stateful case this has been supplied via statefile.
network/manager.go
Outdated
if nm.IsStatelessCNIMode() { | ||
if nicType == cns.DelegatedVMNIC { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the reasoning for adding this? What does it fix and what about other nic types (Frontend, etc.). How does it affect windows and linux (since I believe there are delegated vmnics in both)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the new commit
@@ -537,11 +538,12 @@ func (nm *networkManager) DeleteEndpointState(networkID string, epInfo *Endpoint | |||
NetworkContainerID: epInfo.NetworkContainerID, // we don't use this as long as AllowInboundFromHostToNC and AllowInboundFromNCToHost are false | |||
NetNs: dummyGUID, // to trigger hnsv2, windows | |||
NICType: epInfo.NICType, | |||
NetworkNameSpace: epInfo.NetNsPath, | |||
IfName: epInfo.IfName, // TODO: For stateless cni linux populate IfName here to use in deletion in secondary endpoint client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to delete this comment if this pr fixes linux stateless cni
0def5cf
to
feaacea
Compare
feaacea
to
305303f
Compare
A number of changes is made to stateless CNI to fully support SWiftV2 in Linux:
For validating the scenario ADD/Delete calls have been issues on SwiftV2 cluster and and logs and satefile has been analyzed to make sure it is consistent with Stateful CNI and also nothing gets leaked.
Requirements:
Notes: