add psi performance test #836

qiliRedHat · 2025-11-10T08:08:04Z

https://issues.redhat.com/browse/OCPNODE-3819

openshift-ci · 2025-11-10T08:08:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qiliRedHat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [qiliRedHat]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

qiliRedHat · 2025-11-10T08:13:38Z

Test Result

# ./run_perf_test.sh 
--- Monitoring persistence is already configured ---
NAME          STATUS   AGE
dittybopper   Active   174m
--- Dittybopper is already installed ---
Prepare test...
Applying test pod to find the least usage node...
pod/test-pod created
pod/test-pod condition met
Cleaning up test pod...
pod "test-pod" deleted
Perf test will be run on node: ip-10-0-16-222.us-east-2.compute.internal
Label the perf test node ip-10-0-16-222.us-east-2.compute.internal with node-role.kubernetes.io/perf...
node/ip-10-0-16-222.us-east-2.compute.internal labeled
Generating the stress pod yaml...
--- Starting Test: Feature enabled = false, Stress Type = cpu ---
Collecting idle proxy metrics...
Applying cpu stress workload...
pod/cpu-stress-pod created
pod/cpu-stress-pod condition met
cpu stress workload is ready
Fri Nov  7 12:39:47 PM UTC 2025
Collecting proxy metrics under load after 180 seconds...
Cleaning up workload...
Fri Nov  7 12:42:47 PM UTC 2025
pod "cpu-stress-pod" deleted
--- Finished Test: baseline_feature_disabled_cpu_stress ---
--- Sleep 180s to let the previous stress to cool down ---
--- Starting Test: Feature enabled = false, Stress Type = memory ---
Collecting idle proxy metrics...
Applying memory stress workload...
pod/memory-stress-pod created
pod/memory-stress-pod condition met
memory stress workload is ready
Fri Nov  7 12:45:51 PM UTC 2025
Collecting proxy metrics under load after 180 seconds...
Cleaning up workload...
Fri Nov  7 12:48:52 PM UTC 2025
pod "memory-stress-pod" deleted
--- Finished Test: baseline_feature_disabled_memory_stress ---
--- Sleep 180s to let the previous stress to cool down ---
--- Starting Test: Feature enabled = false, Stress Type = io ---
Collecting idle proxy metrics...
Applying io stress workload...
pod/io-stress-pod created
pod/io-stress-pod condition met
io stress workload is ready
Fri Nov  7 12:52:24 PM UTC 2025
Collecting proxy metrics under load after 180 seconds...
Cleaning up workload...
Fri Nov  7 12:55:25 PM UTC 2025
pod "io-stress-pod" deleted
--- Finished Test: baseline_feature_disabled_io_stress ---
--- Sleep 180s to let the previous stress to cool down ---
==========================================
  OpenShift PSI Enablement Script
==========================================

[INFO] Checking prerequisites...
[SUCCESS] Prerequisites check passed

[INFO] Step 1: Checking current PSI status on worker nodes...
[INFO] Checking PSI status on all worker nodes...
[WARNING] PSI is NOT enabled on ip-10-0-16-222.us-east-2.compute.internal
[WARNING] PSI is NOT enabled on ip-10-0-51-255.us-east-2.compute.internal
[WARNING] PSI is NOT enabled on ip-10-0-84-37.us-east-2.compute.internal
[INFO] PSI is not fully enabled. Proceeding with enablement...

[INFO] Step 2: Creating MachineConfig YAML...
[INFO] Creating MachineConfig YAML: 99-worker-enable-psi.yaml
[SUCCESS] MachineConfig YAML created

[INFO] Step 3: Applying MachineConfig to cluster...
[INFO] Applying MachineConfig to cluster...
machineconfig.machineconfiguration.openshift.io/99-worker-enable-psi created
[SUCCESS] MachineConfig applied

[INFO] Current worker node status:
NAME                                        STATUS   ROLES         AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-16-222.us-east-2.compute.internal   Ready    perf,worker   9h    v1.34.1   10.0.16.222   <none>        Red Hat Enterprise Linux CoreOS 9.6.20251105-0 (Plow)   5.14.0-570.62.1.el9_6.x86_64   cri-o://1.34.1-4.rhaos4.21.git5780ac7.el9
ip-10-0-51-255.us-east-2.compute.internal   Ready    worker        9h    v1.34.1   10.0.51.255   <none>        Red Hat Enterprise Linux CoreOS 9.6.20251105-0 (Plow)   5.14.0-570.62.1.el9_6.x86_64   cri-o://1.34.1-4.rhaos4.21.git5780ac7.el9
ip-10-0-84-37.us-east-2.compute.internal    Ready    worker        9h    v1.34.1   10.0.84.37    <none>        Red Hat Enterprise Linux CoreOS 9.6.20251105-0 (Plow)   5.14.0-570.62.1.el9_6.x86_64   cri-o://1.34.1-4.rhaos4.21.git5780ac7.el9

--- Wait 120 seconds for mcp to start updating ---
[INFO] Step 4: Waiting for all worker nodes to be updated...
[INFO] Waiting for MachineConfigPool 'worker' to update...
[INFO] This may take 30-60 minutes. Workers will reboot one by one.
[INFO] MCP Status: Updated=0/3, Ready=0/3, Degraded=0
[INFO] Waiting 30 seconds before next check... (elapsed: 0s)
[INFO] MCP Status: Updated=1/3, Ready=1/3, Degraded=0
[INFO] Waiting 30 seconds before next check... (elapsed: 30s)
[INFO] MCP Status: Updated=1/3, Ready=1/3, Degraded=0
[INFO] Waiting 30 seconds before next check... (elapsed: 60s)
[INFO] MCP Status: Updated=1/3, Ready=1/3, Degraded=0
[INFO] Waiting 30 seconds before next check... (elapsed: 90s)
[INFO] MCP Status: Updated=1/3, Ready=1/3, Degraded=0
[INFO] Waiting 30 seconds before next check... (elapsed: 120s)
[INFO] MCP Status: Updated=2/3, Ready=2/3, Degraded=0
[INFO] Waiting 30 seconds before next check... (elapsed: 150s)
[INFO] MCP Status: Updated=2/3, Ready=2/3, Degraded=0
[INFO] Waiting 30 seconds before next check... (elapsed: 180s)
[INFO] MCP Status: Updated=2/3, Ready=2/3, Degraded=0
[INFO] Waiting 30 seconds before next check... (elapsed: 210s)
[INFO] MCP Status: Updated=2/3, Ready=2/3, Degraded=0
[INFO] Waiting 30 seconds before next check... (elapsed: 240s)
[INFO] MCP Status: Updated=3/3, Ready=3/3, Degraded=0
[SUCCESS] All worker nodes have been updated!
[SUCCESS] Worker node update completed successfully!

--- Wait 60 seconds for PSI to be ready on nodes---
[INFO] Step 5: Verifying PSI is enabled on all worker nodes...
[INFO] Checking PSI status on all worker nodes...
[SUCCESS] PSI is enabled on ip-10-0-16-222.us-east-2.compute.internal
[SUCCESS] PSI is enabled on ip-10-0-51-255.us-east-2.compute.internal
[SUCCESS] PSI is enabled on ip-10-0-84-37.us-east-2.compute.internal
[SUCCESS] ✅ PSI has been successfully enabled on all worker nodes!

==========================================
[SUCCESS] PSI Enablement Complete!
==========================================
[INFO] You can verify PSI on any worker node with:
  oc debug node/<node-name> -- chroot /host cat /proc/pressure/cpu

[INFO] MachineConfig created: 99-worker-enable-psi.yaml
[INFO] To remove this configuration later, run:
  oc delete machineconfig 99-worker-enable-psi

--- Sleep 10m to let the cluster to become stable after nodes reboot after enabling PSI ---
--- Starting Test: Feature enabled = true, Stress Type = cpu ---
Collecting idle proxy metrics...
Applying cpu stress workload...
pod/cpu-stress-pod created
pod/cpu-stress-pod condition met
cpu stress workload is ready
Fri Nov  7 01:16:50 PM UTC 2025
Collecting proxy metrics under load after 180 seconds...
Cleaning up workload...
Fri Nov  7 01:19:51 PM UTC 2025
pod "cpu-stress-pod" deleted
--- Finished Test: test_feature_enabled_cpu_stress ---
--- Sleep 180s to let the previous stress to cool down ---
--- Starting Test: Feature enabled = true, Stress Type = memory ---
Collecting idle proxy metrics...
Applying memory stress workload...
pod/memory-stress-pod created
pod/memory-stress-pod condition met
memory stress workload is ready
Fri Nov  7 01:22:54 PM UTC 2025
Collecting proxy metrics under load after 180 seconds...
Cleaning up workload...
Fri Nov  7 01:25:55 PM UTC 2025
pod "memory-stress-pod" deleted
--- Finished Test: test_feature_enabled_memory_stress ---
--- Sleep 180s to let the previous stress to cool down ---
--- Starting Test: Feature enabled = true, Stress Type = io ---
Collecting idle proxy metrics...
Applying io stress workload...
pod/io-stress-pod created
pod/io-stress-pod condition met
io stress workload is ready
Fri Nov  7 01:29:28 PM UTC 2025
Collecting proxy metrics under load after 180 seconds...
Cleaning up workload...
Fri Nov  7 01:32:28 PM UTC 2025
pod "io-stress-pod" deleted
--- Finished Test: test_feature_enabled_io_stress ---
--- Sleep 180s to let the previous stress to cool down ---
node/ip-10-0-16-222.us-east-2.compute.internal unlabeled
--- All tests completed. Logs are in ./perf_logs_20251107_123941 ---
--- Analyzing results... ---
--- Performance Analysis Summary by kubectl top node and /proxy/stats/summary---

### Stress Type: CPU

| Metric                 | Condition | Result (Baseline -> Test) |
|------------------------|-----------|---------------------------|
| **Node CPU (m)**       | Idle      | 73.00 -> 78.00 (+5.00 / +6.85%) |
| **Node CPU (m)**       | Load      | 1596.00 -> 1575.00 (-21.00 / -1.32%) |
| **Node Memory (MiB)**  | Idle      | 1842.00 -> 1920.00 (+78.00 / +4.23%) |
| **Node Memory (MiB)**  | Load      | 1975.00 -> 1996.00 (+21.00 / +1.06%) |
| **Kubelet CPU (m)**    | Idle      | 34.79 -> 35.66 (+0.88 / +2.52%) |
| **Kubelet CPU (m)**    | Load      | 40.01 -> 35.60 (-4.41 / -11.02%) |
| **Kubelet Memory (MiB)**| Idle      | 155.55 -> 152.24 (-3.31 / -2.13%) |
| **Kubelet Memory (MiB)**| Load      | 155.17 -> 156.57 (+1.41 / +0.91%) |

### Stress Type: IO

| Metric                 | Condition | Result (Baseline -> Test) |
|------------------------|-----------|---------------------------|
| **Node CPU (m)**       | Idle      | 79.00 -> 79.00 (+0.00 / +0.00%) |
| **Node CPU (m)**       | Load      | 172.00 -> 170.00 (-2.00 / -1.16%) |
| **Node Memory (MiB)**  | Idle      | 2035.00 -> 2020.00 (-15.00 / -0.74%) |
| **Node Memory (MiB)**  | Load      | 2029.00 -> 2014.00 (-15.00 / -0.74%) |
| **Kubelet CPU (m)**    | Idle      | 32.53 -> 41.28 (+8.75 / +26.91%) |
| **Kubelet CPU (m)**    | Load      | 39.70 -> 31.56 (-8.15 / -20.52%) |
| **Kubelet Memory (MiB)**| Idle      | 160.11 -> 164.11 (+4.00 / +2.50%) |
| **Kubelet Memory (MiB)**| Load      | 157.31 -> 161.72 (+4.41 / +2.80%) |

### Stress Type: MEMORY

| Metric                 | Condition | Result (Baseline -> Test) |
|------------------------|-----------|---------------------------|
| **Node CPU (m)**       | Idle      | 75.00 -> 77.00 (+2.00 / +2.67%) |
| **Node CPU (m)**       | Load      | 87.00 -> 82.00 (-5.00 / -5.75%) |
| **Node Memory (MiB)**  | Idle      | 1987.00 -> 1995.00 (+8.00 / +0.40%) |
| **Node Memory (MiB)**  | Load      | 10203.00 -> 10208.00 (+5.00 / +0.05%) |
| **Kubelet CPU (m)**    | Idle      | 32.69 -> 38.06 (+5.37 / +16.42%) |
| **Kubelet CPU (m)**    | Load      | 30.00 -> 34.81 (+4.82 / +16.06%) |
| **Kubelet Memory (MiB)**| Idle      | 153.03 -> 161.22 (+8.20 / +5.36%) |
| **Kubelet Memory (MiB)**| Load      | 159.75 -> 164.29 (+4.54 / +2.84%) |

openshift-ci · 2025-11-10T12:00:57Z

@qiliRedHat: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

qiliRedHat · 2025-11-11T13:06:37Z

@ngopalak-redhat PTAL

openshift-ci bot requested review from liqcui and svetsa-rh November 10, 2025 08:08

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 10, 2025

qiliRedHat force-pushed the node-psi-perf branch from f416458 to af33cd2 Compare November 10, 2025 10:33

add psi performance test

88edaf5

qiliRedHat force-pushed the node-psi-perf branch from af33cd2 to 88edaf5 Compare November 10, 2025 11:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add psi performance test #836

add psi performance test #836

Uh oh!

qiliRedHat commented Nov 10, 2025

Uh oh!

openshift-ci bot commented Nov 10, 2025

Uh oh!

qiliRedHat commented Nov 10, 2025

Uh oh!

openshift-ci bot commented Nov 10, 2025

Uh oh!

qiliRedHat commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

add psi performance test #836

Are you sure you want to change the base?

add psi performance test #836

Uh oh!

Conversation

qiliRedHat commented Nov 10, 2025

Uh oh!

openshift-ci bot commented Nov 10, 2025

Uh oh!

qiliRedHat commented Nov 10, 2025

Uh oh!

openshift-ci bot commented Nov 10, 2025

Uh oh!

qiliRedHat commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant