Skip to content

Conversation

behzad-mir
Copy link
Contributor

@behzad-mir behzad-mir commented Aug 27, 2025

A number of changes is made to stateless CNI to fully support SWiftV2 in Linux:

  1. EndpointID for Stateless CNI should add ifName if the NiCType is Delegated to distinguish that from the InfraNIC endpoint. The reason for this behavior is that Stateless CNI is using only ContainerID for the endpoint ID.
  2. Delete flow has been revised and NetNSPath has been added to the statefile since it is needed by the TransparentClient for Frontend NIC.
  3. The Transparent mode used by statefull CNI for SWiftV1 and V2 and stateless CNI should follow the same. TransparentVlan which is the original value seems to be a mistake.

For validating the scenario ADD/Delete calls have been issues on SwiftV2 cluster and and logs and satefile has been analyzed to make sure it is consistent with Stateful CNI and also nothing gets leaked.

Requirements:

Notes:

@Copilot Copilot AI review requested due to automatic review settings August 27, 2025 07:19
@behzad-mir behzad-mir requested review from a team as code owners August 27, 2025 07:19
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes issues with Stateless CNI delete operations in SwiftV2 scenarios by modifying endpoint ID generation and improving the delete flow. The changes ensure proper distinction between different NIC types and provide necessary context for transparent client operations.

  • Modifies GetEndpointID to accept a NICType parameter and append interface name for delegated NICs
  • Updates delete flow to use proper network manager clients and adds NetNsPath to state information
  • Adds NetworkNameSpace field to CNS REST server structures for frontend NIC support

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
network/manager.go Core logic changes for endpoint ID generation and delete flow improvements
network/manager_mock.go Mock implementation updated to match new GetEndpointID signature
cns/restserver/restserver.go Added NetworkNameSpace field to IPInfo struct
cns/restserver/ipam.go Updated validation and state management for NetworkNameSpace field
cni/network/network.go Updated callers to pass NICType parameter to GetEndpointID

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@@ -514,7 +514,7 @@ func (nm *networkManager) DeleteEndpointState(networkID string, epInfo *Endpoint
nw := &network{
Id: networkID, // currently unused in stateless cni
HnsId: epInfo.HNSNetworkID,
Mode: opModeTransparentVlan,
Mode: opModeTransparent,
Copy link
Preview

Copilot AI Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from opModeTransparentVlan to opModeTransparent appears unrelated to the stated SwiftV2 fixes. This mode change could have significant side effects and should be explained or verified as intentional.

Suggested change
Mode: opModeTransparent,
Mode: opModeTransparentVlan,

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Transparent mode beong used by statefull CNI for SWiftV1 and V2 and stateless CNI should follow the same. TransparentVlan which is the original value seems to be a mistake

}

// verifyUpdateEndpointStateRequest verify the CNI request body for the UpdateENdpointState API
func verifyUpdateEndpointStateRequest(req map[string]*IPInfo) error {
for ifName, InterfaceInfo := range req {
if InterfaceInfo.HostVethName == "" && InterfaceInfo.HnsEndpointID == "" && InterfaceInfo.NICType == "" && InterfaceInfo.MacAddress == "" {
if InterfaceInfo.HostVethName == "" && InterfaceInfo.HnsEndpointID == "" && InterfaceInfo.NICType == "" && InterfaceInfo.MacAddress == "" && InterfaceInfo.NetworkNameSpace == "" {
Copy link
Preview

Copilot AI Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition InterfaceInfo.NICType == "" is incorrect. NICType is of type cns.NICType, not string, so this comparison will always be false. It should be compared against the zero value of the NICType enum or use a different validation approach.

Suggested change
if InterfaceInfo.HostVethName == "" && InterfaceInfo.HnsEndpointID == "" && InterfaceInfo.NICType == "" && InterfaceInfo.MacAddress == "" && InterfaceInfo.NetworkNameSpace == "" {
if InterfaceInfo.HostVethName == "" && InterfaceInfo.HnsEndpointID == "" && InterfaceInfo.NICType == cns.NICType(0) && InterfaceInfo.MacAddress == "" && InterfaceInfo.NetworkNameSpace == "" {

Copilot uses AI. Check for mistakes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behzad-mir can you fix this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is actually correct as NICTYpe is not an enum and it is indeed string!

@@ -1313,12 +1313,16 @@ func updateIPInfoMap(iPInfo map[string]*IPInfo, interfaceInfo *IPInfo, ifName, e
iPInfo[ifName].MacAddress = interfaceInfo.MacAddress
logger.Printf("[updateEndpoint] update the endpoint %s with MacAddress %s", endpointID, interfaceInfo.MacAddress)
}
if interfaceInfo.NetworkNameSpace != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We protect against empty string, what about " " ? Is that a possible condition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the pod is no longer present? I don't know how the netns path is being used here. Do we check this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we write to the statefile so we can store the pod information when a delete call get issued. We are just checking for empty trying to avoid writing empty to the field in statefile. We read NetNs from whatever CNI produces so I don't think it can be " ".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In swiftV2 when a delete call happens the Frontened Nic needs the netNspath. we have a secondry_transparentClient tha moves interface from the Nettwork Namespace.
This is already being used by Stateful CNI and we are just cirrecting it for stateless CNI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the issue if we write empty namespace?

Copy link
Contributor Author

@behzad-mir behzad-mir Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to make the statefile neater. There is no reason to add a field when it is empty.

@@ -1313,12 +1313,16 @@ func updateIPInfoMap(iPInfo map[string]*IPInfo, interfaceInfo *IPInfo, ifName, e
iPInfo[ifName].MacAddress = interfaceInfo.MacAddress
logger.Printf("[updateEndpoint] update the endpoint %s with MacAddress %s", endpointID, interfaceInfo.MacAddress)
}
if interfaceInfo.NetworkNameSpace != "" {
iPInfo[ifName].NetworkNameSpace = interfaceInfo.NetworkNameSpace
logger.Printf("[updateEndpoint] update the endpoint %s with NetworkNameSpace %s", endpointID, interfaceInfo.NetworkNameSpace)
Copy link
Contributor

@MikeZappa87 MikeZappa87 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement has no level, debug, info, warning, error? I see this is what is currently done above, this makes troubleshooting hard does it not? We cant filter on the level. This should be a tech debt item.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct. We need to add levels for all of these logging here. Will address some of them in the next commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we have logger.Info or logger.Error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for old CNS logegr which does not have a Info and instead it has printf.
All the logs in IPAM and other part of CNS code needs to be moved to the loggerV2 which has the proper logging levels. I used the oldlogger to have the PR be consistent with the rest of the file.

@@ -115,7 +115,7 @@ type NetworkManager interface {
DetachEndpoint(networkID string, endpointID string) error
UpdateEndpoint(networkID string, existingEpInfo *EndpointInfo, targetEpInfo *EndpointInfo) error
GetNumberOfEndpoints(ifName string, networkID string) int
GetEndpointID(containerID, ifName string) string
GetEndpointID(containerID, ifName string, nicType cns.NICType) string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we don't. I am planning to add a subsequent (sister) PR to this that add UTs for some of these funcs.

Copy link
Contributor

@santhoshmprabhu santhoshmprabhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a lot of context on the need for these changes. The changes look fine, but we should improve the PR a bit -

  1. Update the description to cover why the opMode change
  2. Address other copilot comments (one about NICType in particular seems serious)
  3. Add a description of what validation steps have been carried out. Include screenshots/logs if necessary.
  4. Add tests.

@@ -1313,12 +1313,16 @@ func updateIPInfoMap(iPInfo map[string]*IPInfo, interfaceInfo *IPInfo, ifName, e
iPInfo[ifName].MacAddress = interfaceInfo.MacAddress
logger.Printf("[updateEndpoint] update the endpoint %s with MacAddress %s", endpointID, interfaceInfo.MacAddress)
}
if interfaceInfo.NetworkNameSpace != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the issue if we write empty namespace?

@@ -1313,12 +1313,16 @@ func updateIPInfoMap(iPInfo map[string]*IPInfo, interfaceInfo *IPInfo, ifName, e
iPInfo[ifName].MacAddress = interfaceInfo.MacAddress
logger.Printf("[updateEndpoint] update the endpoint %s with MacAddress %s", endpointID, interfaceInfo.MacAddress)
}
if interfaceInfo.NetworkNameSpace != "" {
iPInfo[ifName].NetworkNameSpace = interfaceInfo.NetworkNameSpace
logger.Printf("[updateEndpoint] update the endpoint %s with NetworkNameSpace %s", endpointID, interfaceInfo.NetworkNameSpace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we have logger.Info or logger.Error?

}

// verifyUpdateEndpointStateRequest verify the CNI request body for the UpdateENdpointState API
func verifyUpdateEndpointStateRequest(req map[string]*IPInfo) error {
for ifName, InterfaceInfo := range req {
if InterfaceInfo.HostVethName == "" && InterfaceInfo.HnsEndpointID == "" && InterfaceInfo.NICType == "" && InterfaceInfo.MacAddress == "" {
if InterfaceInfo.HostVethName == "" && InterfaceInfo.HnsEndpointID == "" && InterfaceInfo.NICType == "" && InterfaceInfo.MacAddress == "" && InterfaceInfo.NetworkNameSpace == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behzad-mir can you fix this?

@@ -115,7 +115,7 @@ type NetworkManager interface {
DetachEndpoint(networkID string, endpointID string) error
UpdateEndpoint(networkID string, existingEpInfo *EndpointInfo, targetEpInfo *EndpointInfo) error
GetNumberOfEndpoints(ifName string, networkID string) int
GetEndpointID(containerID, ifName string) string
GetEndpointID(containerID, ifName string, nicType cns.NICType) string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of changing existing api, can we define new api GetEndpointIDByNicType ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the new commit

@@ -514,7 +514,7 @@ func (nm *networkManager) DeleteEndpointState(networkID string, epInfo *Endpoint
nw := &network{
Id: networkID, // currently unused in stateless cni
HnsId: epInfo.HNSNetworkID,
Mode: opModeTransparentVlan,
Mode: opModeTransparent,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this a bug in previous PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. So for stateless CNI in Windows this did not matter. For Linux the delete was actually executed fine since this was not multitenancy. It did remove the Veth interface instead of ip route since it will fall into bridge mode (default).

*opt.infraSeen = true
} else {
ifName = "eth" + strconv.Itoa(opt.endpointIndex)
endpointID = plugin.nm.GetEndpointID(opt.args.ContainerID, ifName)
endpointID = plugin.nm.GetEndpointID(opt.args.ContainerID, ifName, opt.ifInfo.NICType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what scenarios was this enabled in beforehand/currently?
Windows swift v1 (single nic)? y/n
Windows swift v2 (multi nic)? y/n
Linux swift v1 (single nic)? y/n
Linux swift v2 (multi nic)? y/n

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been enabled in all swiftV2 multi nic scenarios and also singularity.
In the new commit a new API defined GetEndpointIDByNicType to avoid touching the previous one and also it only make changes to the containerID in statelessCNI DElegatedNIC case.

HostVethName string `json:",omitempty"`
MacAddress string `json:",omitempty"`
NICType cns.NICType `json:",omitempty"`
NetworkNameSpace string `json:",omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, we are adding NetworkNameSpace to the cns and ip info because this information is required during linux endpoint deletion, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's correct it's needed for fornetend nic removal form namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wont namepsace be provided by containerd during cni delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I willcheck that. But currently in stateful case this has been supplied via statefile.

if nm.IsStatelessCNIMode() {
if nicType == cns.DelegatedVMNIC {
Copy link
Contributor

@QxBytes QxBytes Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the reasoning for adding this? What does it fix and what about other nic types (Frontend, etc.). How does it affect windows and linux (since I believe there are delegated vmnics in both)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the new commit

@@ -537,11 +538,12 @@ func (nm *networkManager) DeleteEndpointState(networkID string, epInfo *Endpoint
NetworkContainerID: epInfo.NetworkContainerID, // we don't use this as long as AllowInboundFromHostToNC and AllowInboundFromNCToHost are false
NetNs: dummyGUID, // to trigger hnsv2, windows
NICType: epInfo.NICType,
NetworkNameSpace: epInfo.NetNsPath,
IfName: epInfo.IfName, // TODO: For stateless cni linux populate IfName here to use in deletion in secondary endpoint client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to delete this comment if this pr fixes linux stateless cni

@behzad-mir behzad-mir force-pushed the swiftv2-stateless branch 4 times, most recently from 0def5cf to feaacea Compare September 12, 2025 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants