Skip to content

Commit 5bd554b

Browse files
committed
Introduce new flag - strict-topology
With the current implementation, In delayed binding case, CSI driver is offered with all nodes topology that are matched with 'selected node' topology keys in CreateVolumeRequest.AccessibilityRequirements. So this allows the driver to select any node from the passed preferred list to create volume. But this results in scheduling failure when the volume created on a node other than Kubernetes selected node. To address this, introduced new flag "--strict-topology', when set, in case of delayed binding, the driver is offered with only selected node topology, so that driver has to create the volume on this node. Modified tests so that now every test is run with and without 'strict topology'.
1 parent 967b7a3 commit 5bd554b

File tree

6 files changed

+270
-159
lines changed

6 files changed

+270
-159
lines changed

README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ Note that the external-provisioner does not scale with more replicas. Only one e
4646

4747
### Command line options
4848

49-
#### Recommended optional arguments"
49+
#### Recommended optional arguments
5050
* `--csi-address <path to CSI socket>`: This is the path to the CSI driver socket inside the pod that the external-provisioner container will use to issue CSI operations (`/run/csi/socket` is used by default).
5151

5252
* `--enable-leader-election`: Enables leader election. This is mandatory when there are multiple replicas of the same external-provisioner running for one CSI driver. Only one of them may be active (=leader). A new leader will be re-elected when current leader dies or becomes unresponsive for ~15 seconds.
@@ -64,6 +64,8 @@ Note that the external-provisioner does not scale with more replicas. Only one e
6464
#### Other recognized arguments
6565
* `--feature-gates <gates>`: A set of comma separated `<feature-name>=<true|false>` pairs that describe feature gates for alpha/experimental features. See [list of features](#feature-status) or `--help` output for list of recognized features. Example: `--feature-gates Topology=true` to enable Topology feature that's disabled by default.
6666

67+
* `--strict-topology`: This controls what topology information is passed to `CreateVolumeRequest.AccessibilityRequirements` in case of delayed binding. See [the table below](#topology-support) for an explanation how this option changes the result. This option has no effect if either `Topology` feature is disabled or `Immediate` volume binding mode is used.
68+
6769
* `--kubeconfig <path>`: Path to Kubernetes client configuration that the external-provisioner uses to connect to Kubernetes API server. When omitted, default token provided by Kubernetes will be used. This option is useful only when the external-provisioner does not run as a Kubernetes pod, e.g. for debugging. Either this or `--master` needs to be set if the external-provisioner is being run out of cluster.
6870

6971
* `--master <url>`: Master URL to build a client config from. When omitted, default token provided by Kubernetes will be used. This option is useful only when the external-provisioner does not run as a Kubernetes pod, e.g. for debugging. Either this or `--kubeconfig` needs to be set if the external-provisioner is being run out of cluster.
@@ -83,6 +85,17 @@ Note that the external-provisioner does not scale with more replicas. Only one e
8385

8486
* `--leader-election-type`: This option was used to choose which leader election resource type to use. Currently, the option defaults to `endpoints`, but will be removed in the future to only support `Lease` based leader election.
8587

88+
### Topology support
89+
When `Topology` feature is enabled and the driver specifies `VOLUME_ACCESSIBILITY_CONSTRAINTS` in its plugin capabilities, external-provisioner prepares `CreateVolumeRequest.AccessibilityRequirements` while calling `Controller.CreateVolume`. The driver has to consider these topology constraints while creating the volume. Below table shows how these `AccessibilityRequirements` are prepared:
90+
91+
[Delayed binding](https://kubernetes.io/docs/concepts/storage/storage-classes/#volume-binding-mode) | Strict topology | [Allowed topologies](https://kubernetes.io/docs/concepts/storage/storage-classes/#allowed-topologies) | [Resulting accessability requirements](https://github.com/container-storage-interface/spec/blob/master/spec.md#createvolume)
92+
:---: |:---:|:---:|:---|
93+
Yes | Yes | Irrelevant | `Requisite` = `Preferred` = Selected node topology
94+
Yes | No | No | `Requisite` = Aggregated cluster topology<br>`Preferred` = `Requisite` with selected node topology as first element
95+
Yes | No | Yes | `Requisite` = Allowed topologies<br>`Preferred` = `Requisite` with selected node topology as first element
96+
No | Irrelevant | No | `Requisite` = Aggregated cluster topology<br>`Preferred` = `Requisite` with randomly selected node topology as first element
97+
No | Irrelevant | Yes | `Requisite` = Allowed topologies<br>`Preferred` = `Requisite` with randomly selected node topology as first element
98+
8699
### CSI error and timeout handling
87100
The external-provisioner invokes all gRPC calls to CSI driver with timeout provided by `--timeout` command line argument (15 seconds by default).
88101

cmd/csi-provisioner/csi-provisioner.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ var (
6262

6363
enableLeaderElection = flag.Bool("enable-leader-election", false, "Enables leader election. If leader election is enabled, additional RBAC rules are required. Please refer to the Kubernetes CSI documentation for instructions on setting up these RBAC rules.")
6464
leaderElectionType = flag.String("leader-election-type", "endpoints", "the type of leader election, options are 'endpoints' (default) or 'leases' (strongly recommended). The 'endpoints' option is deprecated in favor of 'leases'.")
65+
strictTopology = flag.Bool("strict-topology", false, "Passes only selected node topology to CreateVolume Request, unlike default behavior of passing aggregated cluster topologies that match with topology keys of the selected node.")
6566

6667
featureGates map[string]bool
6768
provisionController *controller.ProvisionController
@@ -178,7 +179,7 @@ func main() {
178179

179180
// Create the provisioner: it implements the Provisioner interface expected by
180181
// the controller
181-
csiProvisioner := ctrl.NewCSIProvisioner(clientset, *operationTimeout, identity, *volumeNamePrefix, *volumeNameUUIDLength, grpcClient, snapClient, provisionerName, pluginCapabilities, controllerCapabilities, supportsMigrationFromInTreePluginName)
182+
csiProvisioner := ctrl.NewCSIProvisioner(clientset, *operationTimeout, identity, *volumeNamePrefix, *volumeNameUUIDLength, grpcClient, snapClient, provisionerName, pluginCapabilities, controllerCapabilities, supportsMigrationFromInTreePluginName, *strictTopology)
182183
provisionController = controller.NewProvisionController(
183184
clientset,
184185
provisionerName,

pkg/controller/controller.go

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,7 @@ type csiProvisioner struct {
161161
pluginCapabilities connection.PluginCapabilitySet
162162
controllerCapabilities connection.ControllerCapabilitySet
163163
supportsMigrationFromInTreePluginName string
164+
strictTopology bool
164165
}
165166

166167
var _ controller.Provisioner = &csiProvisioner{}
@@ -216,7 +217,8 @@ func NewCSIProvisioner(client kubernetes.Interface,
216217
driverName string,
217218
pluginCapabilities connection.PluginCapabilitySet,
218219
controllerCapabilities connection.ControllerCapabilitySet,
219-
supportsMigrationFromInTreePluginName string) controller.Provisioner {
220+
supportsMigrationFromInTreePluginName string,
221+
strictTopology bool) controller.Provisioner {
220222

221223
csiClient := csi.NewControllerClient(grpcClient)
222224
provisioner := &csiProvisioner{
@@ -232,6 +234,7 @@ func NewCSIProvisioner(client kubernetes.Interface,
232234
pluginCapabilities: pluginCapabilities,
233235
controllerCapabilities: controllerCapabilities,
234236
supportsMigrationFromInTreePluginName: supportsMigrationFromInTreePluginName,
237+
strictTopology: strictTopology,
235238
}
236239
return provisioner
237240
}
@@ -432,7 +435,8 @@ func (p *csiProvisioner) Provision(options controller.ProvisionOptions) (*v1.Per
432435
p.driverName,
433436
options.PVC.Name,
434437
options.StorageClass.AllowedTopologies,
435-
options.SelectedNode)
438+
options.SelectedNode,
439+
p.strictTopology)
436440
if err != nil {
437441
return nil, fmt.Errorf("error generating accessibility requirements: %v", err)
438442
}

pkg/controller/controller_test.go

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -392,7 +392,7 @@ func TestCreateDriverReturnsInvalidCapacityDuringProvision(t *testing.T) {
392392
defer driver.Stop()
393393

394394
pluginCaps, controllerCaps := provisionCapabilities()
395-
csiProvisioner := NewCSIProvisioner(nil, 5*time.Second, "test-provisioner", "test", 5, csiConn.conn, nil, driverName, pluginCaps, controllerCaps, "")
395+
csiProvisioner := NewCSIProvisioner(nil, 5*time.Second, "test-provisioner", "test", 5, csiConn.conn, nil, driverName, pluginCaps, controllerCaps, "", false)
396396

397397
// Requested PVC with requestedBytes storage
398398
deletePolicy := v1.PersistentVolumeReclaimDelete
@@ -1287,7 +1287,7 @@ func runProvisionTest(t *testing.T, k string, tc provisioningTestcase, requested
12871287
}
12881288

12891289
pluginCaps, controllerCaps := provisionCapabilities()
1290-
csiProvisioner := NewCSIProvisioner(clientSet, 5*time.Second, "test-provisioner", "test", 5, csiConn.conn, nil, driverName, pluginCaps, controllerCaps, "")
1290+
csiProvisioner := NewCSIProvisioner(clientSet, 5*time.Second, "test-provisioner", "test", 5, csiConn.conn, nil, driverName, pluginCaps, controllerCaps, "", false)
12911291

12921292
out := &csi.CreateVolumeResponse{
12931293
Volume: &csi.Volume{
@@ -1650,7 +1650,7 @@ func TestProvisionFromSnapshot(t *testing.T) {
16501650
})
16511651

16521652
pluginCaps, controllerCaps := provisionFromSnapshotCapabilities()
1653-
csiProvisioner := NewCSIProvisioner(clientSet, 5*time.Second, "test-provisioner", "test", 5, csiConn.conn, client, driverName, pluginCaps, controllerCaps, "")
1653+
csiProvisioner := NewCSIProvisioner(clientSet, 5*time.Second, "test-provisioner", "test", 5, csiConn.conn, client, driverName, pluginCaps, controllerCaps, "", false)
16541654

16551655
out := &csi.CreateVolumeResponse{
16561656
Volume: &csi.Volume{
@@ -1801,7 +1801,7 @@ func TestProvisionWithTopologyEnabled(t *testing.T) {
18011801
}
18021802

18031803
clientSet := fakeclientset.NewSimpleClientset(nodes, nodeInfos)
1804-
csiProvisioner := NewCSIProvisioner(clientSet, 5*time.Second, "test-provisioner", "test", 5, csiConn.conn, nil, driverName, pluginCaps, controllerCaps, "")
1804+
csiProvisioner := NewCSIProvisioner(clientSet, 5*time.Second, "test-provisioner", "test", 5, csiConn.conn, nil, driverName, pluginCaps, controllerCaps, "", false)
18051805

18061806
pv, err := csiProvisioner.Provision(controller.ProvisionOptions{
18071807
StorageClass: &storagev1.StorageClass{},
@@ -1855,7 +1855,7 @@ func TestProvisionWithTopologyDisabled(t *testing.T) {
18551855

18561856
clientSet := fakeclientset.NewSimpleClientset()
18571857
pluginCaps, controllerCaps := provisionWithTopologyCapabilities()
1858-
csiProvisioner := NewCSIProvisioner(clientSet, 5*time.Second, "test-provisioner", "test", 5, csiConn.conn, nil, driverName, pluginCaps, controllerCaps, "")
1858+
csiProvisioner := NewCSIProvisioner(clientSet, 5*time.Second, "test-provisioner", "test", 5, csiConn.conn, nil, driverName, pluginCaps, controllerCaps, "", false)
18591859

18601860
out := &csi.CreateVolumeResponse{
18611861
Volume: &csi.Volume{
@@ -1902,7 +1902,7 @@ func TestProvisionWithMountOptions(t *testing.T) {
19021902

19031903
clientSet := fakeclientset.NewSimpleClientset()
19041904
pluginCaps, controllerCaps := provisionCapabilities()
1905-
csiProvisioner := NewCSIProvisioner(clientSet, 5*time.Second, "test-provisioner", "test", 5, csiConn.conn, nil, driverName, pluginCaps, controllerCaps, "")
1905+
csiProvisioner := NewCSIProvisioner(clientSet, 5*time.Second, "test-provisioner", "test", 5, csiConn.conn, nil, driverName, pluginCaps, controllerCaps, "", false)
19061906

19071907
out := &csi.CreateVolumeResponse{
19081908
Volume: &csi.Volume{
@@ -2076,7 +2076,7 @@ func runDeleteTest(t *testing.T, k string, tc deleteTestcase) {
20762076
}
20772077

20782078
pluginCaps, controllerCaps := provisionCapabilities()
2079-
csiProvisioner := NewCSIProvisioner(clientSet, 5*time.Second, "test-provisioner", "test", 5, csiConn.conn, nil, driverName, pluginCaps, controllerCaps, "")
2079+
csiProvisioner := NewCSIProvisioner(clientSet, 5*time.Second, "test-provisioner", "test", 5, csiConn.conn, nil, driverName, pluginCaps, controllerCaps, "", false)
20802080

20812081
err = csiProvisioner.Delete(tc.persistentVolume)
20822082
if tc.expectErr && err == nil {

pkg/controller/topology.go

Lines changed: 66 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -139,12 +139,15 @@ func GenerateAccessibilityRequirements(
139139
driverName string,
140140
pvcName string,
141141
allowedTopologies []v1.TopologySelectorTerm,
142-
selectedNode *v1.Node) (*csi.TopologyRequirement, error) {
142+
selectedNode *v1.Node,
143+
strictTopology bool) (*csi.TopologyRequirement, error) {
143144
requirement := &csi.TopologyRequirement{}
144145

145146
var (
146-
selectedCSINode *storage.CSINode
147-
err error
147+
selectedCSINode *storage.CSINode
148+
selectedTopology topologyTerm
149+
requisiteTerms []topologyTerm
150+
err error
148151
)
149152

150153
// 1. Get CSINode for the selected node
@@ -158,20 +161,57 @@ func GenerateAccessibilityRequirements(
158161
// This should only happen if the Node is on a pre-1.14 version
159162
return nil, nil
160163
}
164+
topologyKeys := getTopologyKeys(selectedCSINode, driverName)
165+
if len(topologyKeys) == 0 {
166+
// The scheduler selected a node with no topology information.
167+
// This can happen if:
168+
//
169+
// * the node driver is not deployed on all nodes.
170+
// * the node driver is being restarted and has not re-registered yet. This should be
171+
// temporary and a retry should eventually succeed.
172+
//
173+
// Returning an error in provisioning will cause the scheduler to retry and potentially
174+
// (but not guaranteed) pick a different node.
175+
return nil, fmt.Errorf("no topology key found on CSINode %s", selectedCSINode.Name)
176+
}
177+
var isMissingKey bool
178+
selectedTopology, isMissingKey = getTopologyFromNode(selectedNode, topologyKeys)
179+
if isMissingKey {
180+
return nil, fmt.Errorf("topology labels from selected node %v does not match topology keys from CSINode %v", selectedNode.Labels, topologyKeys)
181+
}
182+
183+
if strictTopology {
184+
// Make sure that selected node topology is in allowed topologies list
185+
if len(allowedTopologies) != 0 {
186+
allowedTopologiesFlatten := flatten(allowedTopologies)
187+
found := false
188+
for _, t := range allowedTopologiesFlatten {
189+
if t.equal(selectedTopology) {
190+
found = true
191+
break
192+
}
193+
}
194+
if !found {
195+
return nil, fmt.Errorf("selected node '%q' topology '%v' is not in allowed topologies: %v", selectedNode.Name, selectedTopology, allowedTopologiesFlatten)
196+
}
197+
}
198+
// Only pass topology of selected node.
199+
requisiteTerms = append(requisiteTerms, selectedTopology)
200+
}
161201
}
162202

163203
// 2. Generate CSI Requisite Terms
164-
var requisiteTerms []topologyTerm
165-
if len(allowedTopologies) == 0 {
166-
// Aggregate existing topologies in nodes across the entire cluster.
167-
var err error
168-
requisiteTerms, err = aggregateTopologies(kubeClient, driverName, selectedCSINode)
169-
if err != nil {
170-
return nil, err
204+
if len(requisiteTerms) == 0 {
205+
if len(allowedTopologies) != 0 {
206+
// Distribute out one of the OR layers in allowedTopologies
207+
requisiteTerms = flatten(allowedTopologies)
208+
} else {
209+
// Aggregate existing topologies in nodes across the entire cluster.
210+
requisiteTerms, err = aggregateTopologies(kubeClient, driverName, selectedCSINode)
211+
if err != nil {
212+
return nil, err
213+
}
171214
}
172-
} else {
173-
// Distribute out one of the OR layers in allowedTopologies
174-
requisiteTerms = flatten(allowedTopologies)
175215
}
176216

177217
// It might be possible to reach here if:
@@ -202,20 +242,19 @@ func GenerateAccessibilityRequirements(
202242
preferredTerms = sortAndShift(requisiteTerms, nil, i)
203243
} else {
204244
// Delayed binding, use topology from that node to populate preferredTerms
205-
topologyKeys := getTopologyKeys(selectedCSINode, driverName)
206-
selectedTopology, isMissingKey := getTopologyFromNode(selectedNode, topologyKeys)
207-
if isMissingKey {
208-
return nil, fmt.Errorf("topology labels from selected node %v does not match topology keys from CSINode %v", selectedNode.Labels, topologyKeys)
209-
}
210-
211-
preferredTerms = sortAndShift(requisiteTerms, selectedTopology, 0)
212-
if preferredTerms == nil {
213-
// Topology from selected node is not in requisite. This case should never be hit:
214-
// - If AllowedTopologies is specified, the scheduler should choose a node satisfying the
215-
// constraint.
216-
// - Otherwise, the aggregated topology is guaranteed to contain topology information from the
217-
// selected node.
218-
return nil, fmt.Errorf("topology %v from selected node %q is not in requisite: %v", selectedTopology, selectedNode.Name, requisiteTerms)
245+
if strictTopology {
246+
// In case of strict topology, preferred = requisite
247+
preferredTerms = requisiteTerms
248+
} else {
249+
preferredTerms = sortAndShift(requisiteTerms, selectedTopology, 0)
250+
if preferredTerms == nil {
251+
// Topology from selected node is not in requisite. This case should never be hit:
252+
// - If AllowedTopologies is specified, the scheduler should choose a node satisfying the
253+
// constraint.
254+
// - Otherwise, the aggregated topology is guaranteed to contain topology information from the
255+
// selected node.
256+
return nil, fmt.Errorf("topology %v from selected node %q is not in requisite: %v", selectedTopology, selectedNode.Name, requisiteTerms)
257+
}
219258
}
220259
}
221260
requirement.Preferred = toCSITopology(preferredTerms)

0 commit comments

Comments
 (0)