Contents:
NOTE: This is work in progress due to missing ready-to-use dependencies.
While it's possible to use fakedev-exporter
in k8s cluster without a
device plugin (YAMLs have few comments about that), it's not really
that useful. Therefore deployment relies on a (faked) GPU plugin
being present.
For now, one needs to:
-
Build "fake device generator" image manually from this pull request:
-
Push resulting image to a local registry, and apply (WiP) k8s GPU plugin [1] integration with it, after updating deployment image URL(s) accordingly:
If cluster runs both fake and real plugin configurations, fake GPU
device plugin config should specify a different label for the fake
nodes, that can be used with the fakedev-exporter
(and its workload)
deployment nodeSelector
, like the example deployments do.
[1] Intel GPU device plugin v0.25.0 (or newer) has support for the
-prefix
option required for GPU device faking.
GPU plugin uses NFD (node-feature-discovery) for labeling the nodes, so NFD is also needed. See GPU plugin installation instructions.
Finally, one needs to build fakedev-exporter
image, push it to some
registry and update its URLs to fakedev-*.yaml
files.
For metrics reporting to work, Prometheus and fakedev-exporter
need to run in the same namespace. If that's not the case, update
everything shown by git grep monitoring
.
Workloads run in validation
namespace. If fake GPU plugin
deployment did not provide that, add it with:
kubectl apply -f workloads/validation-namespace.yaml
Create roles + services used by fakedev-exporter
:
kubectl apply -f common/
Check that nodeSelector
value and selected fakedev-exporter
config
content really match [1] platform name and memory amounts provided by
fake GPU plugin config.
Then start fakedev-exporter
on nodes providing specific fake GPU type:
kubectl apply -f ./
[1] especially in case of SR-IOV, matching things is trickier because PF and its VFs typically have different amount of memory and there's only a subset of metrics available for VFs.
Start suitable batch of fake workloads (WLs) with the same
nodeSelector
as what fakedev-exporter
uses:
kubectl apply -f workloads/fakedev-workload-batch.yaml
Each workload instance will then get one of the fake GPU devices
provided by the GPU plugin, and asks fakedev-exporter
to generate
GPU metrics based on specified fake load on that particular fake GPU,
which are pulled by Prometheus to its metrics database.