From 59dcfa2474e7d77a2360eb06ec949f61ff997960 Mon Sep 17 00:00:00 2001 From: Benjamin Jee Date: Mon, 23 Oct 2023 14:58:56 -0700 Subject: [PATCH 1/4] Add event batch processing results and rerun test --- tests/reconfig/results/v1.0.0.md | 45 ++++++++++++++++++++++---------- tests/reconfig/setup.md | 38 +++++++++++++++++---------- 2 files changed, 55 insertions(+), 28 deletions(-) diff --git a/tests/reconfig/results/v1.0.0.md b/tests/reconfig/results/v1.0.0.md index 803ede268f..8d620c75a7 100644 --- a/tests/reconfig/results/v1.0.0.md +++ b/tests/reconfig/results/v1.0.0.md @@ -3,8 +3,10 @@ - [Reconfiguration testing Results](#reconfiguration-testing-results) - [Test environment](#test-environment) - - [Results Table](#results-table) - - [NumResources -\> Total Resources](#numresources---total-resources) + - [Results Tables](#results-tables) + - [NGINX Reloads and Time to Ready](#nginx-reloads-and-time-to-ready) + - [Event Batch Processing](#event-batch-processing) + - [NumResources -> Total Resources](#numresources---total-resources) - [Observations](#observations) @@ -14,27 +16,42 @@ GKE cluster: - Node count: 3 - Instance Type: e2-medium -- k8s version: 1.27.4-gke.900 -- Zone: europe-west2-b +- k8s version: 1.27.3-gke.100 +- Zone: us-central1-c - Total vCPUs: 6 - Total RAM: 12GB - Max pods per node: 110 NGF deployment: -- NGF version: edge - git commit 72b6c6ef8915c697626eeab88fdb6a3ce15b8da0 +- NGF version: edge - git commit 29b45e38bacd7c4f22834938105e3cda4f29f6d1 - NGINX Version: 1.25.2 -## Results Table +## Results Tables + +### NGINX Reloads and Time to Ready | Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | -| ----------- | ------------ | -------------------- | ------------------------ | ------------- | -------------------------- | -| 1 | 30 | 5 | 5 | 2 | 166 | -| 1 | 150 | 7 | 7 | 2 | 353 | -| 2 | 30 | 21 | <1 | 30 | 142 | -| 2 | 150 | 123 | <1 | 46 | 190 | -| 3 | 30 | <1 | <1 | 93 | 137 | -| 3 | 150 | 1 | 1 | 453 | 127 | +| ----------- | ------------ |----------------------|-------------------------|---------------|----------------------------| +| 1 | 30 | 1 | 1 | 2 | 191 | +| 1 | 150 | 2 | 2 | 2 | 440 | +| 2 | 30 | 50 | <1 | 93 | 162 | +| 2 | 150 | 208 | <1 | 396 | 281 | +| 3 | 30 | 1 | 1 | 93 | 129 | +| 3 | 150 | 1 | 1 | 453 | 130 | + + +### Event Batch Processing + +| Test number | NumResources | Event Batch Total | Event Batch Processing avg time (ms) | <=500ms | <= 1000ms | +|-------------|--------------|-------------------|--------------------------------------|---------|-----------| +| 1 | 30 | 69 | 6.232 | 100% | 100% | +| 1 | 150 | 309 | 3.638 | 99.68% | 100% | +| 2 | 30 | 465 | 38.759 | 100% | 100% | +| 2 | 150 | 1941 | 68.539 | 98.51% | 100% | +| 3 | 30 | 374 | 36.834 | 99.73% | 99.73% | +| 3 | 150 | 1812 | 40.411 | 99.94% | 99.94% | + ## NumResources -> Total Resources | NumResources | Gateways | Secrets | ReferenceGrants | Namespaces | application Pods | application Services | HTTPRoutes | Total Resources | @@ -55,7 +72,7 @@ NGF deployment: Issue filed: https://github.com/nginxinc/nginx-gateway-fabric/issues/1123 -3. All reloads were in the <500ms bucket. A slight increase in the reload time based on number of configured resources +3. All NGINX reloads were in the <500ms bucket. A slight increase in the reload time based on number of configured resources resulting in NGINX configuration changes was observed. 4. No errors (NGF or NGINX) were observed in any test run. diff --git a/tests/reconfig/setup.md b/tests/reconfig/setup.md index 4ad3e70484..50033d2608 100644 --- a/tests/reconfig/setup.md +++ b/tests/reconfig/setup.md @@ -13,8 +13,8 @@ ## Goals -- Measure how long it takes NGF to reconfigure NGINX when a number of Gateway API and referenced core Kubernetes - resources are created at once. +- Measure how long it takes NGF to reconfigure NGINX and update statuses when a number of Gateway API and + referenced core Kubernetes resources are created at once. - Two runs of each test should be ran with differing numbers of resources. Each run will deploy: - a single Gateway, Secret, and ReferenceGrant resources - `x+1` number of namespaces @@ -38,7 +38,8 @@ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v0.8.1/standard-install.yaml ``` -3. Deploy NGF from edge using Helm install (NOTE: For Test 1, deploy AFTER resources): +3. Deploy NGF from edge using Helm install and wait for LoadBalancer Service to be ready + (NOTE: For Test 1, deploy AFTER resources): ```console helm install my-release oci://ghcr.io/nginxinc/charts/nginx-gateway-fabric --version 0.0.0-edge \ @@ -65,10 +66,20 @@ kubectl port-forward $GW_POD -n nginx-gateway 9113:9113 & ``` -6. Measure Time To Ready as described in each test, get the reload count, and get the average NGINX reload duration. - The average reload duration can be computed by taking the `nginx_gateway_fabric_nginx_reloads_milliseconds_sum` - metric value and dividing it by the `nginx_gateway_fabric_nginx_reloads_milliseconds_count` metric value. -7. For accuracy, repeat the test suite once or twice, take the averages, and look for any anomolies or outliers. +6. Measure NGINX Reloads and Time to Ready Results + 1. TimeToReadyTotal as described in each test - NGF logs. + 2. TimeToReadyAvgSingle which is the average time between updating any resource and the + NGINX configuration being reloaded - NGF logs + 3. NGINX Reload count - metrics. + 4. Average NGINX reload duration - metrics. + 1. The average reload duration can be computed by taking the `nginx_gateway_fabric_nginx_reloads_milliseconds_sum` + metric value and dividing it by the `nginx_gateway_fabric_nginx_reloads_milliseconds_count` metric value. +7. Measure Event Batch Processing Results + 1. Event Batch Total - metrics + 2. Average Event Batch Processing duration - metrics + 1. The average event batch processing duraiton can be computed by taking the `nginx_gateway_fabric_event_batch_processing_milliseconds_sum` + metric value and dividing it by the `nginx_gateway_fabric_event_batch_processing_milliseconds_count` metric value. +8. For accuracy, repeat the test suite once or twice, take the averages, and look for any anomolies or outliers. ## Tests @@ -79,8 +90,8 @@ e.g. `cd scripts && bash create-resources-gw-last.sh 30`. The script will deploy backend apps and services, wait 60 seconds for them to be ready, and deploy 1 Gateway, 1 RefGrant, 1 Secret, and HTTPRoutes. 2. Deploy NGF - 3. Check logs for time it takes from start-up -> config written and NGINX reloaded. Get reload count and average reload - duration from metrics and logs. + 3. Measure TimeToReadyTotal as the time it takes from start-up -> config written and + NGINX reloaded. Measure the other results as described in steps 6-7 of the [Setup](#setup) section. ### Test 2: Start NGF, deploy Gateway, create many resources attached to GW @@ -89,9 +100,8 @@ 2. Run the provided script with the required number of resources, e.g. `cd scripts && bash create-resources-routes-last.sh 30`. The script will deploy backend apps and services, wait 60 seconds for them to be ready, and deploy 1 Gateway, 1 Secret, 1 RefGrant, and HTTPRoutes at the same time. - 3. Check logs for time it takes from NGF receiving first resource update -> final config written, and NGINX's final - reload. Check logs for average individual HTTPRoute TTR also. Get reload count and average reload duration from - metrics and logs. + 3. Measure TimeToReadyTotal as the time it takes from NGF receiving the first HTTPRoute resource update -> final + config written and NGINX reloaded. Measure the other results as described in steps 6-7 of the [Setup](#setup) section. ### Test 3: Start NGF, create many resources attached to a Gateway, deploy the Gateway @@ -101,5 +111,5 @@ e.g. `cd scripts && bash create-resources-gw-last.sh 30`. The script will deploy the namespaces, backend apps and services, 1 Secret, 1 ReferenceGrant, and the HTTPRoutes; wait 60 seconds for the backend apps to be ready, and then deploy 1 Gateway for all HTTPRoutes. - 3. Check logs for time it takes from NGF receiving gateway resource -> config written and NGINX reloaded. Get reload - count and average reload duration from metrics and logs. + 3. Measure TimeToReadyTotal as the time it takes from NGF receiving gateway resource -> config written and NGINX reloaded. + Measure the other results as described in steps 6-7 of the [Setup](#setup) section. From bed318da159e1c5a15ec0f17aec083fa418783fa Mon Sep 17 00:00:00 2001 From: Benjamin Jee Date: Tue, 24 Oct 2023 08:58:41 -0700 Subject: [PATCH 2/4] Small nit fixes --- tests/reconfig/results/v1.0.0.md | 16 ++++++++-------- tests/reconfig/setup.md | 6 +++--- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/tests/reconfig/results/v1.0.0.md b/tests/reconfig/results/v1.0.0.md index 8d620c75a7..ced75f6ff2 100644 --- a/tests/reconfig/results/v1.0.0.md +++ b/tests/reconfig/results/v1.0.0.md @@ -31,14 +31,14 @@ NGF deployment: ### NGINX Reloads and Time to Ready -| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | -| ----------- | ------------ |----------------------|-------------------------|---------------|----------------------------| -| 1 | 30 | 1 | 1 | 2 | 191 | -| 1 | 150 | 2 | 2 | 2 | 440 | -| 2 | 30 | 50 | <1 | 93 | 162 | -| 2 | 150 | 208 | <1 | 396 | 281 | -| 3 | 30 | 1 | 1 | 93 | 129 | -| 3 | 150 | 1 | 1 | 453 | 130 | +| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms)| +| ----------- | ------------ |----------------------|--------------------------|---------------|---------------------------| +| 1 | 30 | 1 | 1 | 2 | 191 | +| 1 | 150 | 2 | 2 | 2 | 440 | +| 2 | 30 | 50 | <1 | 93 | 162 | +| 2 | 150 | 208 | <1 | 396 | 281 | +| 3 | 30 | 1 | 1 | 93 | 129 | +| 3 | 150 | 1 | 1 | 453 | 130 | ### Event Batch Processing diff --git a/tests/reconfig/setup.md b/tests/reconfig/setup.md index 50033d2608..52ded61509 100644 --- a/tests/reconfig/setup.md +++ b/tests/reconfig/setup.md @@ -69,14 +69,14 @@ 6. Measure NGINX Reloads and Time to Ready Results 1. TimeToReadyTotal as described in each test - NGF logs. 2. TimeToReadyAvgSingle which is the average time between updating any resource and the - NGINX configuration being reloaded - NGF logs + NGINX configuration being reloaded - NGF logs. 3. NGINX Reload count - metrics. 4. Average NGINX reload duration - metrics. 1. The average reload duration can be computed by taking the `nginx_gateway_fabric_nginx_reloads_milliseconds_sum` metric value and dividing it by the `nginx_gateway_fabric_nginx_reloads_milliseconds_count` metric value. 7. Measure Event Batch Processing Results - 1. Event Batch Total - metrics - 2. Average Event Batch Processing duration - metrics + 1. Event Batch Total - metrics. + 2. Average Event Batch Processing duration - metrics. 1. The average event batch processing duraiton can be computed by taking the `nginx_gateway_fabric_event_batch_processing_milliseconds_sum` metric value and dividing it by the `nginx_gateway_fabric_event_batch_processing_milliseconds_count` metric value. 8. For accuracy, repeat the test suite once or twice, take the averages, and look for any anomolies or outliers. From 92a217907216f1c8e6d8f8523e6aca8d6f032488 Mon Sep 17 00:00:00 2001 From: Benjamin Jee Date: Tue, 24 Oct 2023 09:27:48 -0700 Subject: [PATCH 3/4] Fix file structure --- tests/reconfig/results/{v1.0.0.md => 1.0.0/1.0.0.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename tests/reconfig/results/{v1.0.0.md => 1.0.0/1.0.0.md} (100%) diff --git a/tests/reconfig/results/v1.0.0.md b/tests/reconfig/results/1.0.0/1.0.0.md similarity index 100% rename from tests/reconfig/results/v1.0.0.md rename to tests/reconfig/results/1.0.0/1.0.0.md From a4c92e115ad3fd332839edf07e6494106d0cb721 Mon Sep 17 00:00:00 2001 From: Benjamin Jee Date: Tue, 24 Oct 2023 10:44:17 -0700 Subject: [PATCH 4/4] Add reload time distribution --- tests/reconfig/results/1.0.0/1.0.0.md | 36 +++++++++++++-------------- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/tests/reconfig/results/1.0.0/1.0.0.md b/tests/reconfig/results/1.0.0/1.0.0.md index ced75f6ff2..30524405a1 100644 --- a/tests/reconfig/results/1.0.0/1.0.0.md +++ b/tests/reconfig/results/1.0.0/1.0.0.md @@ -31,26 +31,26 @@ NGF deployment: ### NGINX Reloads and Time to Ready -| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms)| -| ----------- | ------------ |----------------------|--------------------------|---------------|---------------------------| -| 1 | 30 | 1 | 1 | 2 | 191 | -| 1 | 150 | 2 | 2 | 2 | 440 | -| 2 | 30 | 50 | <1 | 93 | 162 | -| 2 | 150 | 208 | <1 | 396 | 281 | -| 3 | 30 | 1 | 1 | 93 | 129 | -| 3 | 150 | 1 | 1 | 453 | 130 | +| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | <= 500ms | <= 1000ms | +|-------------|--------------|----------------------|--------------------------|---------------|----------------------------|----------|-----------| +| 1 | 30 | 1 | 1 | 2 | 191 | 100% | 100% | +| 1 | 150 | 2 | 2 | 2 | 440 | 50% | 100% | +| 2 | 30 | 50 | <1 | 93 | 162 | 100% | 100% | +| 2 | 150 | 208 | <1 | 396 | 281 | 96.46% | 100% | +| 3 | 30 | 1 | 1 | 93 | 129 | 100% | 100% | +| 3 | 150 | 1 | 1 | 453 | 130 | 100% | 100% | ### Event Batch Processing -| Test number | NumResources | Event Batch Total | Event Batch Processing avg time (ms) | <=500ms | <= 1000ms | -|-------------|--------------|-------------------|--------------------------------------|---------|-----------| -| 1 | 30 | 69 | 6.232 | 100% | 100% | -| 1 | 150 | 309 | 3.638 | 99.68% | 100% | -| 2 | 30 | 465 | 38.759 | 100% | 100% | -| 2 | 150 | 1941 | 68.539 | 98.51% | 100% | -| 3 | 30 | 374 | 36.834 | 99.73% | 99.73% | -| 3 | 150 | 1812 | 40.411 | 99.94% | 99.94% | +| Test number | NumResources | Event Batch Total | Event Batch Processing avg time (ms) | <= 500ms | <= 1000ms | +|-------------|--------------|-------------------|--------------------------------------|----------|-----------| +| 1 | 30 | 69 | 6.232 | 100% | 100% | +| 1 | 150 | 309 | 3.638 | 99.68% | 100% | +| 2 | 30 | 465 | 38.759 | 100% | 100% | +| 2 | 150 | 1941 | 68.539 | 98.51% | 100% | +| 3 | 30 | 374 | 36.834 | 99.73% | 99.73% | +| 3 | 150 | 1812 | 40.411 | 99.94% | 99.94% | ## NumResources -> Total Resources @@ -72,7 +72,7 @@ NGF deployment: Issue filed: https://github.com/nginxinc/nginx-gateway-fabric/issues/1123 -3. All NGINX reloads were in the <500ms bucket. A slight increase in the reload time based on number of configured resources - resulting in NGINX configuration changes was observed. +3. Majority of NGINX reloads were in the <= 500ms bucket, with all of them being in the <= 1000ms bucket. An increase + in the reload time based on number of configured resources resulting in NGINX configuration changes was observed. 4. No errors (NGF or NGINX) were observed in any test run.