-
Notifications
You must be signed in to change notification settings - Fork 163
blog: updated E2E best practices #392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It would be great to get this published before KubeCon EU 2023 because then readers have the opportunity to ask questions on-site in person. @aojea: The intro, the architecture and the "next steps" are new. The rest is text that you already reviewed earlier for kubernetes/community#7021, just updated a bit to make it flow better in a blog post. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent
/lgtm
/assign @mrbobbytables For approval. Let's get this published before KubeCon, then folks can chat with me about it there. |
`ginkgo.DeferCleanup` executes code in the more useful last-in-first-out order, | ||
i.e. things that get set up first get removed last. | ||
|
||
Objects created in the test namespace do not need to be deleted because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you should introduce this at the beginning of the section, that the framework creates a test namespace to avoid test pollution ... I don't know if this behavior of "test namespaces" is well known outside of kubernetes
|
||
Objects created in the test namespace do not need to be deleted because | ||
deleting the namespace will also delete them. However, if deleting an object | ||
may fail, then explicitly cleaning it up is better because then failures or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, if you create objects that are going to take some times to be completed delated, I had this mistakes with "terminating pods", I set a grace period of 300 seconds and the pod blocks the garbage collectors for that time
may fail, then explicitly cleaning it up is better because then failures or | ||
timeouts related to it will be more obvious. | ||
|
||
In cases where the test may have removed the object, `framework.IgnoreNotFound` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed this is a common mistake
|
||
## Polling and timeouts | ||
|
||
When waiting for something to happen, use a reasonable timeout. Without it, a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When waiting for something to happen, use a reasonable timeout. Without it, a | |
When waiting for something to happen and you need to do asynchronous assertions, use a reasonable timeout. Without it, a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o asynchronous checks, I think people is familiar with this term
When waiting for something to happen, use a reasonable timeout. Without it, a | ||
test might keep running until the entire test suite gets killed by the | ||
CI. Beware that the CI under load may take a lot longer to complete some | ||
operation compared to running the same test locally. On the other hand, a too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" On the other hand," here is misleading, I think that you may express the too main problems
- short timeout, the test will flake, per example if the CI is slow
- long timeout, the test may hide underline issues, per example, if there are some races with other components and eventually the condition pass
The thing with timeouts, is that you also should define what is the expected time you consider valid for a an operation to succeed, e2e are not only functiona, i.e. creating a pod and it takes more than 10 minutes to run should not pass because that environment is too busy, or Services can not take more than 1 minute in program the dataplance, ... timeouts are also important to set the upper limits for some behaviors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the key is to obtain the right balance, a timeout that doesn't flake and that the time is considerable acceptable for that operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two problems with too long timeouts:
- a feature is broken and some expected state will never occur, but the test needs to run till it times out waiting for that state
- a feature is normally supposed to work within a certain time frame and for some reason is taking too long
The problem is that we don't have good enough control over the performance of the clusters that we test against, nor do many features have any solid "must work within XYZ seconds". Solving that problem goes beyond what we can solve right now.
I'll clarify the first point and add a comment about the second.
- informative during interactive use (i.e. intermediate reports, either | ||
periodically or on demand) | ||
- little to no output during a CI run except when it fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these 2 sentences can be confusing, since is difficult to be informative without giving output 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll explain that the amount of information should depend on how the E2E suite was invoked.
- extension mechanism for writing custom checks | ||
- early abort when condition cannot be reached anymore | ||
|
||
[`gomega.Eventually`](https://pkg.go.dev/github.com/onsi/gomega#Eventually) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it also accepts contexts :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is part of "all criteria", right? So no need to change anything in the text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I was just a personal comment to show my +1 to this function
area, so beware that these APIs may | ||
change at some point. | ||
|
||
- Use `gomega.Consistently` to ensure that some condition is true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is important, unfortunately I'm afraid we are not doing it as much as we should
LGTM I left some comments that are not blockers and if necessary can be follow ups /assign @sftim you need docs people IIRC for approval |
/hold I'll follow up on some of the suggestions before this gets merged. |
c66c356
to
846fba3
Compare
/hold cancel PR updated, ready for another LGTM and approval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm cancel
The publication date should be in the future at the time we merge it. However, it looks otherwise OK.
--- | ||
layout: blog | ||
title: "E2E Testing Best Practices, Reloaded" | ||
date: 2023-04-04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a publication date, how about 2023-04-12?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, approval is SIG ContribEx blog team. |
/lgtm Thanks |
"Writing good E2E tests" was already updated a while ago in kubernetes/community#7021. As suggested there (kubernetes/community#7021 (comment)), we should bring this update to the attention of more contributors, hence this blog post.
846fba3
to
7f1a7e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrbobbytables, pohly, xmcqueen The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
"Writing good E2E tests" was already updated a while ago in kubernetes/community#7021. As suggested there (kubernetes/community#7021 (comment)), we should bring this update to the attention of more contributors, hence this blog post.
/cc @aojea @jberkus