Proposal: Restructure TiDB Tests

tison · July 7, 2021, 11:45pm

This is a pre-RFC to TiDB project. You’re encouraged to comment and share your thoughts. I’d start a formal proposal under pingcap/tidb if there is a lazy consensus.

You can see the progress on the tracking issue now.

Proposal: Restructure TiDB Tests

Author(s): tison
Discussion PR: https://github.com/pingcap/tidb/pull/26024
Tracking Issue: https://github.com/pingcap/tidb/issues/26022

Introduction

This document describes the proposal to restructure TiDB tests, focusing on unit tests and integration tests, especially on separating integration_test.go as well as unifying test infrastructure.

Motivation or Background

On the current codebase of TiDB, tests suffer from several major troubles.

There are many unstable tests mainly about concurrency logic, which impacts a lot of the development experience during the project.
The main test framework (pingcap/check) is lack of maintenance. Not only the integration with modern IDE such as GoLand is terrible so that every newcomer asks how to run a unit test, but also nobody is responsible for the test framework and the development on upstream (go-check/check) seems to halt.
Logs created by passed tests are noisy and redundant, see also discussion on TiDB Internals forum.
There are a lot of huge test files contains many cases of “integration tests” which make the tests subtle and brittle, as well as hard to be run in parallel.
- expression/integration_test.go has 9801 lines.
- executor/executor_test.go has 8726 lines.
- ddl/db_test.go has 6930 lines.
- session/session_test.go has 4825.
- planner/core/integration_test.go has 3900 lines.
The satellite tests are disordered. Where they are? How can we fix failures reported by them or write new cases?
- idc-jenkins-ci-tidb/common-test
- idc-jenkins-ci-tidb/integration-common-test
- idc-jenkins-ci-tidb/integration-br-test
- idc-jenkins-ci-tidb/integration-compatibility-test
- idc-jenkins-ci-tidb/integration-copr-test
- idc-jenkins-ci-tidb/integration-ddl-test
- idc-jenkins-ci-tidb/mybatis-test
- idc-jenkins-ci-tidb/sqllogic-test-1
- idc-jenkins-ci-tidb/sqllogic-test-2
- idc-jenkins-ci-tidb/tics-test

In this proposal, we are focus on dealing with the first four troubles, and leave the last trouble as a start of the discussion.

Detailed Design

To overcome the first four troubles listed above, I propose to start with migrating the main test framework from pingcap/check to stretchr/testify.

Besides, we refer to a series of suggestions for better tests in the appendix sections so that we can spawn another tracking issue to handle the problem caused by large test files, redundant logs, and unstable tests.

As described above, the main test framework of TiDB, i.e., pingcap/check, is lack of maintenance. Not only the integration with modern IDE such as GoLand is terrible so that every newcomer asks how to run a unit test, but also nobody is responsible for the test framework. What’s worse, the development on upstream (go-check/check) seems to halt.

stretchr/testify is a widely adopted test framework for Golang project. We can make a quick comparison between these projects.

Dimension	pingcap/check	go-check/check	stretchr/testify
stars	16	617	13.6k
contributors	17	13	188
development	little, inactive for a year	little, inactive for a while	active
integration	poor	out of go testing	go testing, GoLand, and so on
checkers	poor	even poor than pingcap/check	fruitful

There is no doubt stretchr/testify is a better adoption than pingcap/check or go-check/check. You can see also the complaint on TiDB Internals forum, named Better way to debug single unit test in tidb.

Many of Golang projects previous using pingcap/check, such as pingcap/errors, pingcap/log, and tikv/client-go, migrate to stretchr/testify and gain all benefits.

Implementation Plan

The implementation of detailed design is separated into three phases.

Phase 0: Start the migration to stretchr/testify by introducing the dependency, but new tests can be still written in pingcap/check.
Phase 1: Once test kits in stretchr/testify version are implemented and there are adequate migration examples, prevent new tests written in pingcap/check.
Phase 2: Finalize the migration and drop the dependency to pingcap/check.

It is planned to finish the migration in two sprints or three, i.e., four months or by the end of this year.

I will moderate the proposal and call for a vote for phase 1 on the forum and reflect the result in the tracking issue. The prerequisites should include implement all test kits necessary and have copyable migration samples. We may do it packages by packages, though.

Test Design

This proposal is basically about tests themselves. And what we design to do with tests is covered by the detailed design section.

Impacts & Risks

Projects depend on github.com/pingcap/tidb/util/testkit or other test utils may break

For example, pingcap/br, which has an unfortunate circle dependency with pingcap/tidb for historical reasons, depends on github.com/pingcap/tidb/util/testkit and will be broken after we reach phase 2.

However, those projects can copy and paste the missing kits when implementation comes to phase 2. Also, we can take care of these cases and help them get rid of pingcap/check later.

Comment here if other projects suffer from this issue.

pingcap/br

Investigation & Alternatives

To be honest, go testing already provides a test framework with parallel supports, children tests supports, and logging supports.

kubernetes, coredns, and etcd all make use of go testing, and only kubenetes uses stretchr/testify for its fruitful checkers.

This proposal keeps the same preference that we stay as friendly with go testing as possible, make use of checkers provided by stretchr/testify, and generate our own test kits.

Test Improvement Suggestions

Leverage test logger implemented at pingcap/log#16

This section continues from the previous discussion on the TiDB Internals forum, named Turn off redundant logging in TiDB tests.

Logs created by passed tests are generally less interesting to developers. go.uber.org/zap provides a module named zaptest to redirect output to testing.TB. It only caches the output per test case, and only print the output to the console if the test failed.

pingcap/log#16 brings this function to pingcap/log and we’d better make use of it to prevent redundant logs output. In detail, we will introduce a method with the signature logutil.InitTestLogger(t zaptest.TestingT, cfg *LogConfig) and enable in every test related to logs.

Note that the test logger only prevents logs when go test without -v flag. But since we keep printing logs on failure case, we can safely remove the flag.

Prefer determinate tests for concurrency logics

Make use of sync.Mutex, sync.WaitGroup, or other concurrent utils like latches, and get rid of “long enough” sleep, which is brittle and causes unnecessary time for tests.

Separate huge test files into small test units that can be run in parallel

expression/integration_test.go has 9801 lines.
executor/executor_test.go has 8726 lines.
ddl/db_test.go has 6930 lines.
session/session_test.go has 4825.
planner/core/integration_test.go has 3900 lines.

We should break these test files into small test units and try to perform white-box tests instead of just testing the end-to-end behavior.

Unresolved Questions

What parts of the design are still to be determined?

xhebox · June 29, 2021, 6:50am

The implementation plan is not detailed enough to be executed steadily. Here are some concerns:

The migration of framework will cost lots of time. And improving tests while migrating will potentially add more and more new works to the TODO.
And some tests will not be better with another new framework.

The proposal is already a long run. And it will be longer and longer if rethinking tests are included. And we are likely lost. I suggest:

We can do the migration mechanically: only switch the framework and fix some simple tests. That is, we will classify tests by workload and types(small copy&paste, little refactor, or may not be resolved in months/years). Small refactors will be acceptable to finish along the migration, e.g. increasing parallel executions by splitting tests into some human-readable forms.

And middle/large/special ones are only recorded, e.g. fix the logger in tests. It is independent of switching frameworks. Tests that can not be improved with the new framework is likely the case. We can list the work down and give a more accurate schedule.

tison · June 29, 2021, 7:03am

And improving tests while migrating will potentially add more and more new works to the TODO.

At least I don’t think it is a good suggestion if you want to migrate “expression/integration_test.go” JUST TO testify and remain everything AS IS. I don’t mean rewrite TiDB, but testable codebase to obtain the migrated tests.

And some tests will not be better with another new framework.

Which? From where I can see there is no need to use pingcap/check, I suggest adopt go testing while using testify with their checkers. We will also make test kits do not depend on pingcap/check. So this statement does not hold.

The proposal is already a long run. And it will be longer and longer if rethinking tests are included. And we are likely lost.

I agree, so we have a final stage to accept principles. So far, there are no more than two. One is prevent undetermined tests, the other is break down huge test files. I don’t think they are unreasonable or something we’d like to keep if we migrate to a new test framework.

We can do the migration mechanically

Please never do it. Automation will ruin you because tests are very tightly coupled as well as subtle. If you take a look at how we write tests currently, you don’t suggest do it mechanically.

We can list the work down and give a more accurate schedule.

Definitely. The tracking issue will contains two section in my mind currently.

# Migrations

* [ ] Implement {{ some_test_kit }}
* [ ] Migrate {{ some_test }} to testify
* [ ] Migrate {{ another_test }} to testify

# Unstable tests

* [ ] Make {{ some_test }} run determinate
* [ ] Make {{ another_test }} run determinate

# Test log

* [ ] Using test logger in {{ some_test }}
* [ ] Using test logger in {{ another_test }}

We can create 3 tracking issue for each purpose, though.

xhebox · June 29, 2021, 7:35am

I does not mean the framework is useless: it is of little use to other types of problems. For example, planner/core/logical_plan_test.go does not depend on testkit. HandleDDLEvent and OnSchemaStateChanged both serve for test ddl refreshing, yet we can not remove one.

Rethinking is easy, but the solution is not. It is out of the scope of the proposal, in some aspects, but is, indeed, essential about improving/restructuring tests.

I think splitting tests/parallel executions is in the scope of this proposal: it is totally possible. But other kinds of works depend on their own situations.

Misunderstanding. I mean be stupid about improving tests. Some improvements are very hard. Record it will be better.

I think it is also a part of the proposal. Maybe re-edit the page.

tiancaiamao · June 29, 2021, 10:50am

I like the idea of improving our testing, it’s beneficial in the long run.

My concern is, it involves tooooooooooooo much work.

If we have 8-10 members promise they will participate in this activity and claim part of the package files as their task, this might be possible. Otherwise the scope of this proposal is too big to be finished.

dveeden · June 29, 2021, 10:58am

This would probably avoid issues like this: https://github.com/pingcap/parser/pull/1227 which is good. But indeed it is a lot of work.

tison · June 29, 2021, 11:25am

@xhebox @tiancaiamao @dveeden Thanks for your replies! I leave a few blank in the first version of proposal to collect potential ideas.

If you agree on the direction and be afraid about the scope, don’t worry. I will rephrase significant paragraphs to narrow the scope of this proposal in days and ping you then.

When it comes to the implementation plan, what is most important to be reviewed is that how much impact it causes to feature development.

From the content above, we plan to introduce stretchr/testify as a test dependency at the first step. Given that other feature development happens concurrently, we don’t forbid new tests to be written in pingcap/check in Phase 0, but still encourage to write in stretchr/testify unless there is obvious reason cannot do that. And later in Phase 1, we forbid it.

Here come two questions.

When to move to Phase 1?
What if feature development touches test files moved to stretchr/testify?

To the first question, I will moderate the proposal and call for a vote for phase 1 on the forum and reflect the result to the tracking issue. The prerequisites should include implement all test kits necessary and have copyable migration sample. We may do it packages by packages, though.

To the second question, I’d suggest the feature crew writes tests in stretchr/testify; otherwise creates a new file to hold the tests. The speed of migration is normally faster than new test creation and we can later move to Phase 1.

Thus, I think the impact to feature development is manageable.

If you worry about the man-force devoted to this effort, I’ll moderate the proposal and as what I do now, call for a wide range of contributors to participant in the migration. It is planned to finish the migration in two sprints or three, i.e., four months or by the end of this year, which seems sufficient to me.

Of course I have to clarify that we don’t intent to do it in a long time since codebase separation is painful to develop. We have a clear goal and explicit checkpoint (remove pingcap/check dependency) and the separation will be temporary.

disksing · June 29, 2021, 11:53am

We have implemented another version of script in PD.

github.com

tikv/pd/blob/master/scripts/check-testing-t.sh

#!/bin/bash

# Check if there are any packages foget to add `TestingT` when use "github.com/pingcap/check".

res=$(diff <(grep -rl --include=\*_test.go "github.com/pingcap/check" . | xargs -L 1 dirname | sort -u) \
     <(grep -rl --include=\*_test.go -E "^\s*(check\.)?TestingT\(" . | xargs -L 1 dirname | sort -u))

if [ "$res" ]; then
  echo "following packages may be lost TestingT:"
  echo "$res" | awk '{if(NF>1){print $2}}'
  exit 1
fi

# Check if there are duplicated `TestingT` in package.

res=$(grep -r --include=\*_test.go "TestingT(" . | cut -f1 | xargs -L 1 dirname | sort | uniq -d)

if [ "$res" ]; then
  echo "following packages may have duplicated TestingT:"
  echo "$res"

This file has been truncated. show original

disksing · June 29, 2021, 11:58am

I also think we should semi-mechanically complete the library replacement first. Replacing the test framework is a task that can be done, sorting out and optimizing the tests should be a constant task with no end. Better not to mix them up. Of course, it’s not a bad idea to fix some handy issues while migrating the test framework.

tison · June 29, 2021, 2:57pm

You are right. I will downgrade the “Rethink each tests during the migration” to a suggestion and create a separated tracking issue to continuous reduce unstable tests.

Break down huge test files, on the other hand, should be handled in this pass anyway. I don’t think it is a good experience migrate 10,000 lines of code or review it at once.

tison · June 29, 2021, 2:57pm

Not really. I don’t plan to involve many of senior developers spend a lot of time to migrate tests. But involve as many as possible new contributors so that they can also understand the codebase when migrate tests. Inputs on things unreasonable, undocumented, or hard to understand, are valuable. Of course it is out of the scope of the proposal and we don’t set it as a MUST requirement but a preference.

Committers and reviewers works on the review and merge side to prevent introduce risks.

Also, if we just move tests from one framework to another framework in the same structure without any changes, I will recognize it as cargo cult and avoid doing that.

disksing · June 29, 2021, 3:57pm

I don’t think replacing test framework is cargo cult. You did mention some good features of testify. If it is meaningless to only replace gocheck, why don’t you simply improve tests without changing test framework?

tison · June 29, 2021, 4:18pm

Right. Then you also recognize the value to improve tests either on the fly or later.

The term “cargo cult” is possibly offensive, I apology for it.

tison · June 30, 2021, 11:00am

@disksing @xhebox @tiancaiamao @dveeden I’ve updated the content, please take a look.

If it is good to go, I’m going to create a PR in pingcap/tidb repo. But I’m also willing to wait for a little bit so that we don’t rush things.

disksing · July 1, 2021, 10:46am

Sounds good to me. //

tiancaiamao · July 1, 2021, 2:04pm

I can’t say I agree or disagree.
I do support you spiritually, it’s something good!
But sometimes we know “the value of everything and the price of nothing” … until we really do it.
The worst case is that we leave it in an intermediate state, and the tests become more messy with two existing test frameworks.
… but if we never try, we never know … so if you really like it, go ahead, and good luck.

tison · July 2, 2021, 2:42am

I understand your concern. It would require quite a few reviewing support from reviewers and committers. But we can reach the final stage.

If there is no further concern, I’m going to create a proposal in pingcap/tidb the next week.

tison · July 7, 2021, 11:46pm

Hi audience in this thread,

Thanks for your attention. I create a pr to bring this design document to pingcap/tidb. Please take a look.

best,
tison.

feitian124 · July 26, 2021, 7:46am

i’m a fresh man here.
i think it is not a good idea to skip the work just because there are too much work.
use fresh man is a good idea and i’m glad to contribute to tidb.

tison · July 27, 2021, 11:16pm

Thanks for your participant @feitian124 !

As I wrote in This Week on TiDB 20210711, join force in restructure tests is a good first for contributors, especially this proposal covers a wide range of packages under pingcap/tidb.

When you participant this effort, there is low hanging fruit you can get familiar with golang programming, testing, and concurrency, as we started from “util” dir. Later we go to complex package such as “executor”, “planner”, “ddl”, “seesion”, etc. and grow our skill set gradually.

I the proposer will handle the progress of the whole effort, but for every contributor especially new contributors, it is better you spend more time reading the production code and think of its reasonability. Refactor on demand and we get an understandable, stable code base. This kind of improvements can be driven from but not belong to the proposal itself.

Proposal: Restructure TiDB Tests

Proposal: Restructure TiDB Tests

Table of Contents

Introduction

Motivation or Background

Detailed Design

Implementation Plan

Test Design

Impacts & Risks

Projects depend on github.com/pingcap/tidb/util/testkit or other test utils may break

Investigation & Alternatives

Test Improvement Suggestions

Leverage test logger implemented at pingcap/log#16

Prefer determinate tests for concurrency logics

Separate huge test files into small test units that can be run in parallel

Unresolved Questions