blog: updated E2E best practices #392

pohly · 2023-04-05T13:08:18Z

"Writing good E2E tests" was already updated a while ago in kubernetes/community#7021. As suggested there (kubernetes/community#7021 (comment)), we should bring this update to the attention of more contributors, hence this blog post.

/cc @aojea @jberkus

pohly · 2023-04-05T13:11:40Z

It would be great to get this published before KubeCon EU 2023 because then readers have the opportunity to ask questions on-site in person.

@aojea: The intro, the architecture and the "next steps" are new. The rest is text that you already reviewed earlier for kubernetes/community#7021, just updated a bit to make it flow better in a blog post.

xmcqueen

excellent
/lgtm

pohly · 2023-04-06T21:19:33Z

/assign @mrbobbytables

For approval. Let's get this published before KubeCon, then folks can chat with me about it there.

content/en/blog/2023/e2e-testing-best-practices.md

aojea · 2023-04-07T08:46:28Z

content/en/blog/2023/e2e-testing-best-practices.md

+`ginkgo.DeferCleanup` executes code in the more useful last-in-first-out order,
+i.e. things that get set up first get removed last.
+
+Objects created in the test namespace do not need to be deleted because


maybe you should introduce this at the beginning of the section, that the framework creates a test namespace to avoid test pollution ... I don't know if this behavior of "test namespaces" is well known outside of kubernetes

aojea · 2023-04-07T08:47:50Z

content/en/blog/2023/e2e-testing-best-practices.md

+
+Objects created in the test namespace do not need to be deleted because
+deleting the namespace will also delete them. However, if deleting an object
+may fail, then explicitly cleaning it up is better because then failures or


also, if you create objects that are going to take some times to be completed delated, I had this mistakes with "terminating pods", I set a grace period of 300 seconds and the pod blocks the garbage collectors for that time

aojea · 2023-04-07T08:49:19Z

content/en/blog/2023/e2e-testing-best-practices.md

+may fail, then explicitly cleaning it up is better because then failures or
+timeouts related to it will be more obvious.
+
+In cases where the test may have removed the object, `framework.IgnoreNotFound`


indeed this is a common mistake

aojea · 2023-04-07T08:52:53Z

content/en/blog/2023/e2e-testing-best-practices.md

+
+## Polling and timeouts
+
+When waiting for something to happen, use a reasonable timeout. Without it, a


Suggested change

When waiting for something to happen, use a reasonable timeout. Without it, a

When waiting for something to happen and you need to do asynchronous assertions, use a reasonable timeout. Without it, a

o asynchronous checks, I think people is familiar with this term

aojea · 2023-04-07T08:58:37Z

content/en/blog/2023/e2e-testing-best-practices.md

+When waiting for something to happen, use a reasonable timeout. Without it, a
+test might keep running until the entire test suite gets killed by the
+CI. Beware that the CI under load may take a lot longer to complete some
+operation compared to running the same test locally. On the other hand, a too


" On the other hand," here is misleading, I think that you may express the too main problems

short timeout, the test will flake, per example if the CI is slow

long timeout, the test may hide underline issues, per example, if there are some races with other components and eventually the condition pass

The thing with timeouts, is that you also should define what is the expected time you consider valid for a an operation to succeed, e2e are not only functiona, i.e. creating a pod and it takes more than 10 minutes to run should not pass because that environment is too busy, or Services can not take more than 1 minute in program the dataplance, ... timeouts are also important to set the upper limits for some behaviors

the key is to obtain the right balance, a timeout that doesn't flake and that the time is considerable acceptable for that operation

There are two problems with too long timeouts:

a feature is broken and some expected state will never occur, but the test needs to run till it times out waiting for that state

a feature is normally supposed to work within a certain time frame and for some reason is taking too long

The problem is that we don't have good enough control over the performance of the clusters that we test against, nor do many features have any solid "must work within XYZ seconds". Solving that problem goes beyond what we can solve right now.

I'll clarify the first point and add a comment about the second.

aojea · 2023-04-07T09:01:15Z

content/en/blog/2023/e2e-testing-best-practices.md

+- informative during interactive use (i.e. intermediate reports, either
+  periodically or on demand)
+- little to no output during a CI run except when it fails


these 2 sentences can be confusing, since is difficult to be informative without giving output 😄

I'll explain that the amount of information should depend on how the E2E suite was invoked.

aojea · 2023-04-07T09:01:55Z

content/en/blog/2023/e2e-testing-best-practices.md

+- extension mechanism for writing custom checks
+- early abort when condition cannot be reached anymore
+
+[`gomega.Eventually`](https://pkg.go.dev/github.com/onsi/gomega#Eventually)


it also accepts contexts :)

Which is part of "all criteria", right? So no need to change anything in the text.

yeah, I was just a personal comment to show my +1 to this function

aojea · 2023-04-07T09:03:34Z

content/en/blog/2023/e2e-testing-best-practices.md

+  area, so beware that these APIs may
+  change at some point.
+
+- Use `gomega.Consistently` to ensure that some condition is true


this is important, unfortunately I'm afraid we are not doing it as much as we should

aojea · 2023-04-07T09:07:10Z

LGTM
overall good summary of the work done and the expectations for e2e tests, it will be good to promote this blog for reviewers of the different SIGs , so we can have consistency on all the e2e tests.

I left some comments that are not blockers and if necessary can be follow ups

/assign @sftim

you need docs people IIRC for approval

pohly · 2023-04-07T16:31:47Z

/hold

I'll follow up on some of the suggestions before this gets merged.

pohly · 2023-04-08T16:35:38Z

/hold cancel

PR updated, ready for another LGTM and approval.

sftim

/lgtm cancel

The publication date should be in the future at the time we merge it. However, it looks otherwise OK.

sftim · 2023-04-08T22:13:38Z

content/en/blog/2023/e2e-testing-best-practices.md

+---
+layout: blog
+title: "E2E Testing Best Practices, Reloaded"
+date: 2023-04-04


For a publication date, how about 2023-04-12?

Works for me, updated accordingly: https://github.com/kubernetes/contributor-site/compare/846fba3cd23f37924ea22e687ad015d838a4a7fd..7f1a7e9808047ed5c84b577428ccdfab209c4d4e

sftim · 2023-04-08T22:15:09Z

BTW, approval is SIG ContribEx blog team.

aojea · 2023-04-09T12:26:57Z

/lgtm

Thanks

"Writing good E2E tests" was already updated a while ago in kubernetes/community#7021. As suggested there (kubernetes/community#7021 (comment)), we should bring this update to the attention of more contributors, hence this blog post.

mrbobbytables

/lgtm
/approve

k8s-ci-robot · 2023-04-09T16:14:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrbobbytables, pohly, xmcqueen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mrbobbytables]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from aojea and jberkus April 5, 2023 13:08

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 5, 2023

xmcqueen approved these changes Apr 6, 2023

View reviewed changes

k8s-ci-robot assigned xmcqueen Apr 6, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 6, 2023

k8s-ci-robot assigned mrbobbytables Apr 6, 2023

aojea reviewed Apr 7, 2023

View reviewed changes

content/en/blog/2023/e2e-testing-best-practices.md Outdated Show resolved Hide resolved

aojea reviewed Apr 7, 2023

View reviewed changes

k8s-ci-robot assigned sftim Apr 7, 2023

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Apr 7, 2023

pohly force-pushed the e2e-testing-bkms-blog branch from c66c356 to 846fba3 Compare April 8, 2023 16:35

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 8, 2023

sftim reviewed Apr 8, 2023

View reviewed changes

k8s-ci-robot assigned aojea Apr 9, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 9, 2023

pohly added 2 commits April 9, 2023 18:06

blog: updated E2E best practices

2fc51d3

"Writing good E2E tests" was already updated a while ago in kubernetes/community#7021. As suggested there (kubernetes/community#7021 (comment)), we should bring this update to the attention of more contributors, hence this blog post.

blog: address review feedback

7f1a7e9

pohly force-pushed the e2e-testing-bkms-blog branch from 846fba3 to 7f1a7e9 Compare April 9, 2023 16:06

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 9, 2023

mrbobbytables reviewed Apr 9, 2023

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 9, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 9, 2023

k8s-ci-robot merged commit fc8ecbf into kubernetes:master Apr 9, 2023


		## Polling and timeouts

		When waiting for something to happen, use a reasonable timeout. Without it, a

	When waiting for something to happen, use a reasonable timeout. Without it, a
	When waiting for something to happen and you need to do asynchronous assertions, use a reasonable timeout. Without it, a

blog: updated E2E best practices #392

blog: updated E2E best practices #392

Uh oh!

Conversation

pohly commented Apr 5, 2023

Uh oh!

pohly commented Apr 5, 2023

Uh oh!

xmcqueen left a comment

Choose a reason for hiding this comment

Uh oh!

pohly commented Apr 6, 2023

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aojea commented Apr 7, 2023

Uh oh!

pohly commented Apr 7, 2023

Uh oh!

pohly commented Apr 8, 2023

Uh oh!

sftim left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sftim commented Apr 8, 2023

Uh oh!

aojea commented Apr 9, 2023

Uh oh!

mrbobbytables left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Apr 9, 2023

Uh oh!

Uh oh!