You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix flakiness in the E2E test e2e_multi_cluster_replica_set_scale_up (#231)
# Summary
The E2E test `e2e_multi_cluster_replica_set_scale_up` has been flaky and
@lucian-tosa suggested that we should fix it. It has been failing while
waiting for statefulsets (STSs) to have correct number of in case multi
cluster mongoDB deployment. And the problem was sometimes after the
`MongoDBMultiCluster (mdbmc)` resource got into running phase (that
would mean all the STSs are ready), some of the STSs got into not ready
state.
When we see that `mdbmc` resource is Running we try to make sure that
STSs have correct number of replicas but because of above problem (STSs
transitioning into not ready state from ready), STS didn't have the
correct number of replicas and tests failed.
The reason why STS was transitioning into not ready state from ready is,
the pod that it was maintaining did the same, i.e., it transitioned from
ready state to not ready state. After looking into it further we got to
know that the pod is behaving like this because sometimes, it's
readiness probe fails momentarily. And because of that the pod gets to
ready and then transitioned to not ready (readiness probe failed) and
then eventually becomes ready. This is documented in much more detail in
the document [here](https://jira.mongodb.org/browse/CLOUDP-329231).
The ideal fix of the problem would be to figure out why the readiness
probe fails and then fix that. But this PR has the workaround that
changes the test slightly to wait for STSs to get correct number of
replicas.
Jira ticket: https://jira.mongodb.org/browse/CLOUDP-329422
## Proof of Work
Ran the test `e2e_multi_cluster_replica_set_scale_up` manually locally
to make sure that it's passing consistently. I am not able to reproduce
the flakiness now.
## Checklist
- [x] Have you linked a jira ticket and/or is the ticket in the title?
- [x] Have you checked whether your jira ticket required DOCSP changes?
- [x] Have you checked for release_note changes?
# Even though we already verified, in previous test, that the MongoDBMultiCluster resource's phase is running (that would mean all STSs are ready);
85
+
# checking the expected number of replicas for STS makes the test flaky because of an issue mentioned in detail in this ticket https://jira.mongodb.org/browse/CLOUDP-329231.
86
+
# That's why we are waiting for STS to have expected number of replicas. This change can be reverted when we make the proper fix as
# Even though we already verified, in previous test, that the MongoDBMultiCluster resource's phase is running (that would mean all STSs are ready);
133
+
# checking the expected number of replicas for STS makes the test flaky because of an issue mentioned in detail in this ticket https://jira.mongodb.org/browse/CLOUDP-329231.
134
+
# That's why we are waiting for STS to have expected number of replicas. This change can be reverted when we make the proper fix as
0 commit comments