How to break down change set of my CTE feature branch?

wjhuang2016 · June 10, 2021, 5:44am

Say that I am working on CTE.
To make some SQLs work. We need the plan building part and the executor part.
Both of them are about 1000 LoC. And the tests may also about 1000 LoC.
So it’s not easy to get them merged.

tison · May 11, 2021, 10:55am

I’d suggest separate the whole feature into multiple pull requests so that reviewers can review them conceptually isolated one by one. While we merge them only if the whole feature, with the test is well qualified.

Here is an example how Flink manages this type of effort.

Above is the last pull request for the whole effort of FLINK-11843, a large refactor to Dispatcher. And you can trace the based on message in the description find its dependent chain. The integration tests is introduced by the last pull request, and we review these pull requests step by step. While all of the pull requests merged, it is general access.

Back to the CTE case, you may have three parts of pull request

executor part(no dep)
planner part(depend on executor)
integration tests

and review them as a whole. Also I think you already separate the parser part pull request.

hundundm · May 11, 2021, 11:26am

I am a developer on the PD. This is an example of mine, and it may have some reference significance.

I first complete the main development and testing on a branch, try to classify all the changes, and specify a PR submission plan. Then write them one by one according to the plan, and each PR focuses on the details previously ignored. All progress is recorded in an issue, which is convenient for reviewers to understand related PRs.

tison · May 11, 2021, 11:45am

… and you might elaborate the challenge you face. AFAIK code can hardly be separated because of their complexity instead of their loc. Even you attach your dev work so far here and I can help on the reviewer side or throw suggestions of possible separation.

wjhuang2016 · May 11, 2021, 12:02pm

The first problem is that, it’s very hard to test the executor alone.
And the second problem is that, it’s a good idea to use a single PR to add the integration tests?
If so, how can I test the planner part?

tison · May 11, 2021, 12:45pm

I think you can take a look at materials linked above and the answer is there.

The first problem is that, it’s very hard to test the executor alone.

You can add basic unit test and leave the integration tests later.

And the second problem is that, it’s a good idea to use a single PR to add the integration tests?
If so, how can I test the planner part?

If the parts for the minimal integration tests you separate into are planner and executor, and planner depends on executor, as mentioned above

You may join the integration tests part and planner part into one pull request, as links to Flink demo above, the last pull request contains integration tests.

However, you can try to introduce the minimal feature set of CTE, said support a naive select only, and later enrich the planner part if you want to separate more fine grained.

tison · May 11, 2021, 12:56pm

… and for development task broken down, the most significant part is first define the central interface. For CTE, it could be CTEStorage, CTEExec and so on. Every part interacts with the central interfaces which ensures effort could be joined at the end.

Every part takes care of its own unit tests if there should be valid ones, and even integration tests among subparts of one part. The one who finalize the whole tasks take care of overall integration tests.

wjhuang2016 · May 11, 2021, 1:13pm

Every part takes care of its own unit tests if there should be valid ones

How to add a unit test for the executor without the planner part? It’s not easy to mock a real planner for it.

However, you can try to introduce the minimal feature set of CTE, said support a naive select only, and later enrich the planner part if you want to separate more fine grained.

If so, the PR is buggy. It has many bugs. For example, if you use JOIN, then it will panic.

tison · May 11, 2021, 1:33pm

When I draft the plan to support foreign data wrapper, planner and executor is a whole part. The original text is

Every part takes care of its own unit tests if there should be valid ones

which means if there is no valid one, skip this step. I’ve already attched a link to how Flink develop does this job https://github.com/apache/flink/pull/9832 and you can see some of its prerequisites there is no new tests and that is ok.

Only if you use CTE with JOIN, right? That is what I mean first a naive set and spread it. But if it is too fine grained, you can just put all stuff at once.

So far, we discuss conceptually without you real dev branch and task which is to be ported to TiDB. If all dev tasks developed by yourself, you can create a pull request and the reviewer could help guide you separate the pull request if their review experience stucks. Otherwise, it is nonissue.

wjhuang2016 · May 11, 2021, 1:35pm

Yeah, but we are required to make a PR’s LoC less than 500 (:

tison · May 11, 2021, 1:47pm

Could you please share the link to stuff you’d like to push to upstream. Then we can start with the real issue.

I think general suggestions are already there as linked by @hundundm 's and me. You don’t click the link and thus I guess you’d like to solve you real issue, so let’s get the real issue first. It’s not how can I conceptually break down tasks FOR implementing CTE, but I have a pull request over 1000+ lines, how can I break down to several pull requests in less lines. First of all, we should get that diff set.

wjhuang2016 · May 13, 2021, 12:48pm

Here is the demo PR.
I think I can move some stuffs out of this PR.

tison · May 20, 2021, 6:45am

Sorry for late reply. I’ve commented on the demo pr.

wjhuang2016 · May 20, 2021, 7:23am

And the next question is how to add the tests?
It’s not easy to test it since it only covers the plan build, so is the executor part.
They need to test together.

tison · May 24, 2021, 3:53pm

The example from Flink community attached above explains the details.

It is mainly about PR separation best practices as well as Git manipulation. You might take a look at this Chinese blog for a complete discussion on Git branches management in industry.

Back to the question, I can see you have already broken down parser part pull request. And the demo PR linked above is only about planner while you did even extract the error code part into a self-contained one.

For the remain tasks on the planner part, the execution part, and also the testing part. I suggest you use two PRs to propose the change set.

the first (base) PR contains the planner part
the second PR based on the first PR adds the execution part and the testing part

… then you can open two PRs at the same time and reviewers review them together, while when they are looking in the first PR, they focus on the planner part; and in the second PR, focus on the additional execution and testing part.

For branch manipulation, you may want to know how to reconcile comment address and fixups. Below is an example flow (all push is omitted).

git checkout -b first
# develop the planner part
git add .
git commit -s -m "..."
git checkout -b second
# develop the rest part
git add .
git commit -s -m "..."
# when something wrong happens in the first branch
git checkout first
# fixing...
git add .
git commit -s --fixup=HEAD
git checkout second
git rebase first