RFC: add (path) "prefix" option to GPU plugin for scalability testing #1104

eero-t · 2022-08-18T17:36:47Z

PR adds -prefix option to GPU plugin which allows it to read faked devfs and sysfs GPU device files (please check the commit messages for details).

PR includes also a tool for generating such files, which I've tested with the (included) modified GPU plugin / pod spec in k8s cluster. Those are included just for reference, and should be split to separate PR(s) before this is merged.

For now, they are under "fakedev" subdir under gpu_plugin/. What would be best place for the generator code, maybe cmd/gpu_fake_devs/generator.go?

GPU plugin pod running with -prefix will have that generator as its first init container. For NFD labeling to work properly with faked devices, gpu_nfdhook must be run in GPU plugin pod context, not copied to host as NFD hook, as is done by the modified GPU plugin pod spec (while gpu_nfdhook does not need changes for now, see [1]).

With these, one can fake an arbitrary number of devices which can be used to do scalability testing both for GPU plugin, and things depending on it.

[1] Links:

Option to disable NFD hooks: Config option to disable hooks kubernetes-sigs/node-feature-discovery#859
Disabling NFD hooks by default: Disable hook support by default kubernetes-sigs/node-feature-discovery#855
Drop NFD support for hooks: Drop support for hooks kubernetes-sigs/node-feature-discovery#856
Tool for generating fake metrics for the fake GPU devices: https://github.com/intel/fakedev-exporter

tkatila · 2022-08-19T06:55:13Z

Can you give an example yaml of the deployment?

uniemimu

I don't really see much benefit in the prefix, I see complications and unnecessary logic.

You should primarily aim to fake the resource with its existing name without breaking dependencies to the existing resource name.

eero-t · 2022-08-19T14:50:52Z

@uniemimu & @tkatila I dropped the resource prefix commit, renamed the option just to "-prefix", added generator code & faked GPU plugin pod spec as reference, and updated the title + PR description accordingly.

eero-t · 2022-08-19T16:09:06Z

Pod spec is just example of the changes. When using it in cluster running also real GPU plugin, you would add nodeSelector to the daemonSet to constrain faked GPU plugin to non-GPU nodes.

With the example configMap, nodeSelector for a faked GPU workload [1] would ask for nodes where "gpu.intel.com/platform_fake_DG1.present" label is true, and real workloads would need to ask for non-fake GPU label.

Eventually there would be following PRs:

"-prefix" option to GPU plugin (this PR)
fake device file generator code + container for it + example config(s)
GPU plugin kustomize file(s) for customizing GPU plugin pod to use those
Relevant documentation updates

[1] If fake workload is one from "fakedev-exporter" project, it can fake suitable GPU load for the (fake) GPU assigned to it, which allows testing further GPU related k8s components.

Devices can be faked for scalability testing when non-standard paths are used (GPU plugin code assumes container paths to match host paths, and container runtime prevents creating fake files under real paths). Note: If one wants to run both normal GPU plugin and faked one in same cluster, all nodes providing fake "i915" resources should be labeled differently from ones with real GPU plugin + devices, so that real GPU workloads can be limited to correct nodes with a suitable nodeSelector. Signed-off-by: Eero Tamminen <[email protected]>

Signed-off-by: Eero Tamminen <[email protected]>

Based on input JSON file

Fake devfs directory is mounted from host so OCI runtime can "mount" device files also to workloads requesting fake devices. This means that those files can persist over fake GPU plugin life-time, so earlier files need to be removed, as they may not match. Also, DaemonSet restarts failing init containers, so errors about directories generated on previous generator run would prevent getting logs of the real error from first generator run.

Represent fake GPU devices with null devices: https://www.kernel.org/doc/Documentation/admin-guide/devices.txt Real devfs check needed also changing, and removal warnings were simplified, as there's always just one entry.

Signed-off-by: Eero Tamminen <[email protected]>

With latest devices release.

Signed-off-by: Eero Tamminen <[email protected]>

eero-t · 2022-08-19T17:41:57Z

Pushed fixes to CI complaints (including some white-space complaints for already existing GPU plugin code).

uniemimu

Overall I think this looks good. I'm still going to have a nitpicking round if I find some style issues or so but in the bigger picture this is excellent.

tkatila · 2022-08-22T12:04:16Z

cmd/gpu_plugin/fakedev/fake-8x-DG1.yaml

+immutable: false
+data:
+  fakedev.json: |-
+    {


Should the base object be an array? In case we ever want to define a heterogenous fake system.

If you mean different GPUs on different nodes, one could just run multiple fake plugins, restricted to different nodes. Or code could be changed to read multiple JSON files, and if JSON file includes new include & exclude keys listing node names, that code compares against specified env var value.

If you mean different GPUs on same node, there could be a device offset key in JSON, with code changed to parse multiple config files.

Both of above changes would be fully backwards compatible i.e. they could be done later.

=> If needed, I'll add those features later on, but keep code simple for now.

PS. I do not think changing API later would be problem either, as this is just a (scalability) test tool, config API change only requires matching generator version, and one gets error warning if they do not match.

cmd/gpu_plugin/fakedev/generator.go

cmd/gpu_plugin/fakedev/intel-gpu-plugin-fake.yaml

cmd/gpu_plugin/gpu_plugin.go

tkatila · 2022-08-22T12:26:19Z

I like that it's all deployable from k8s. No need to ssh into node and run generator there etc. Overall, it looks good!

As suggested by Ukri. Signed-off-by: Eero Tamminen <[email protected]>

Give more detailed logging for most likely failure, as MkNod() device node creation can fail as normal user. Additional error checking done in new dir removal helper function fixes Ukri's review comments. There's now error if to-be-removed fake sysfs has more content than expected (earlier such check was only for fake devfs content). Signed-off-by: Eero Tamminen <[email protected]>

Noticed by Tuomas. Signed-off-by: Eero Tamminen <[email protected]>

eero-t · 2022-08-22T18:33:41Z

With above comments addressed, do you have any advise on where to put the generator code, container & example config?

(It's probably best to leave just the platform item and remove rest of keys from Capabilities map in its config.)

tkatila · 2022-08-23T09:28:00Z

For the locations. The yaml would probably go to /deployments/fake-gpu-plugin (or similar).
For the generator code, I was wondering if it would make sense to include it in the gpu init container? We wouldn't then need another container for it.

eero-t · 2022-08-24T09:29:53Z

For the locations. The yaml would probably go to /deployments/fake-gpu-plugin (or similar).

I think YAML is best as new kustomize dir for GPU plugin.

For the generator code, I was wondering if it would make sense to include it in the gpu init container? We wouldn't then need another container for it.

Including it in gpu_nfdhook/ directory and container image would make things simpler => I'll do that and split rest of the stuff to separate PRs now that Mikko fixes SGX CI builds.

As is has no extra deps and is only 3.5MB when statically compiled, image size increase and impact of that on GPU plugin deployment could be fine I guess, although generator is used only for testing.

However:

It needs more access rights than NFD hook (to write node files)
In long run NFD hooks are likely to go away [1]: Drop support for hooks kubernetes-sigs/node-feature-discovery#856

[1] I.e. that functionality would either be included into GPU plugin itself, or it runs as GPU plugin sidecar to update NFD feature file whenever GPUs change.

So I think it still makes sense to run generator in a separate init container, although it would be on the same image with the NFD hook.

eero-t · 2022-08-24T16:37:35Z

@tkatila Project container templating does not support building or installing multiple binaries to the same container image, so I need to put generator to a separate directory / container after all. Maybe "gpu_fakedev"?

tkatila · 2022-08-25T07:23:58Z

Maybe "gpu_fakedev"?

Sounds fine to me.

eero-t · 2022-08-25T18:00:36Z

Closing now that the features discussed here have their own PRs.

eero-t requested review from tkatila, bart0sh and uniemimu as code owners August 18, 2022 17:36

uniemimu reviewed Aug 19, 2022

View reviewed changes

eero-t force-pushed the fakedev branch from 291a1bc to 1b00a30 Compare August 19, 2022 14:41

eero-t changed the title ~~RFC: add "fake-mode" to GPU plugin for scalability testing~~ RFC: add (path) "prefix" option to GPU plugin for scalability testing Aug 19, 2022

eero-t added 9 commits August 19, 2022 19:14

More detailed log for number of found GPU devices / resource types

460fce1

Signed-off-by: Eero Tamminen <[email protected]>

Add code for generating fake GPU sysfs + devfs files

f49ca25

Based on input JSON file

Container runtime requires device files to real be devices

ca83c87

Represent fake GPU devices with null devices: https://www.kernel.org/doc/Documentation/admin-guide/devices.txt Real devfs check needed also changing, and removal warnings were simplified, as there's always just one entry.

Apply golang-ci-lint suggestions to device generator

1e0e04d

Signed-off-by: Eero Tamminen <[email protected]>

Use normal GPU plugin deployment pod spec as base

e540306

With latest devices release.

Add 8x DG1 configMap for fake GPU device generator

d5ff613

Signed-off-by: Eero Tamminen <[email protected]>

Switch Intel plugin pod to use faked devices

269788e

Signed-off-by: Eero Tamminen <[email protected]>

eero-t force-pushed the fakedev branch from 1b00a30 to 269788e Compare August 19, 2022 17:17

Apply golang-ci-lint suggestions to GPU plugin

b9038fa

Signed-off-by: Eero Tamminen <[email protected]>

uniemimu reviewed Aug 22, 2022

View reviewed changes

tkatila reviewed Aug 22, 2022

View reviewed changes

uniemimu reviewed Aug 22, 2022

View reviewed changes

cmd/gpu_plugin/fakedev/generator.go Outdated Show resolved Hide resolved

tkatila reviewed Aug 22, 2022

View reviewed changes

cmd/gpu_plugin/fakedev/intel-gpu-plugin-fake.yaml Outdated Show resolved Hide resolved

uniemimu reviewed Aug 22, 2022

View reviewed changes

cmd/gpu_plugin/gpu_plugin.go Outdated Show resolved Hide resolved

uniemimu reviewed Aug 22, 2022

View reviewed changes

cmd/gpu_plugin/gpu_plugin.go Show resolved Hide resolved

eero-t added 2 commits August 22, 2022 20:16

Trivialize GPU plugin -prefix option handling

ee1ac15

As suggested by Ukri. Signed-off-by: Eero Tamminen <[email protected]>

Fix -prefix option name

968e294

Noticed by Tuomas. Signed-off-by: Eero Tamminen <[email protected]>

eero-t mentioned this pull request Aug 24, 2022

Add "prefix" option to GPU plugin for scalability testing #1114

Merged

eero-t mentioned this pull request Aug 24, 2022

Add fake GPU device generator for scalability testing #1116

Merged

eero-t mentioned this pull request Aug 25, 2022

WIP: Use intel-gpu-plugin with intel-gpu-fakedev generated devices #1118

Open

eero-t closed this Aug 25, 2022

RFC: add (path) "prefix" option to GPU plugin for scalability testing #1104

RFC: add (path) "prefix" option to GPU plugin for scalability testing #1104

Uh oh!

Conversation

eero-t commented Aug 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tkatila commented Aug 19, 2022

Uh oh!

uniemimu left a comment

Choose a reason for hiding this comment

Uh oh!

eero-t commented Aug 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eero-t commented Aug 19, 2022

Uh oh!

eero-t commented Aug 19, 2022

Uh oh!

uniemimu left a comment

Choose a reason for hiding this comment

Uh oh!

tkatila Aug 22, 2022

Choose a reason for hiding this comment

Uh oh!

eero-t Aug 22, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tkatila commented Aug 22, 2022

Uh oh!

eero-t commented Aug 22, 2022

Uh oh!

tkatila commented Aug 23, 2022

Uh oh!

eero-t commented Aug 24, 2022

Uh oh!

eero-t commented Aug 24, 2022

Uh oh!

tkatila commented Aug 25, 2022

Uh oh!

eero-t commented Aug 25, 2022

Uh oh!

Uh oh!

eero-t commented Aug 18, 2022 •

edited

Loading

eero-t commented Aug 19, 2022 •

edited

Loading