I want to share some experience about e2e testing framework hacking in https://github.com/pingcap/tidb-operator.
Several months ago, we found that TiDB Operator’s average CI success rate is lower than 50%, which makes the PR workflow very painful. We quickly found out that it’s the e2e test’s issue, and I volunteered to dig into the code and see what happened.
After trials and error, we managed to raise the successful rate to above 90%, and which is mainly related to an internal resource-racing bug. Although the bug itself is not relevant to most people, we accumulated something useful to us along the way, and I hope that’s interesting to some of you. The link is here: Introduction to TiDB Operator E2E Test.
It mainly covers the Kubernetes e2e testing framework, integration with Jenkins UI, debugging hints, and some useful external links. Hope you guys find it helpful!