Proposal: Move parser back to pingcap/tidb

There is an RFC on github now. Please go github and comment, the link is here. The RFC will be seen as accepted, once it is LGTM3 and there is no objection in 3 days.

Motivation

In short, I want to move pingcap/parser back to the pingcap/tidb repository in the form of go sub-module. It is moved to a separate repository due to dependency problems but now the thing changed.

Why and How

I will explain why and how to migration parser back to the tidb repository.

Why did we move parser into a separate repository?

The PR is tidb#2923: Move TiDB parser to a separate repository. In short, when we migrated to go module 111, @tiancaiamao decided to do so, because

We can put generated parser.go in the repository directly, rather than generate it using Makefile script. The drawback of this way is that every time parser.y is touched, there will be many lines change in parser.go . Then the TiDB repo will be inflate quickly.

A better way is moving parser to a separated repo, so every time parser is changed, only go.mod need to change accordingly.

Personally, that is a weird argument for me. Indeed, pingcap/tidb does not store the diff of parser.go anymore, but pingcap/parser does. How is moving the inflation from one repo to another better?

Another argument is about the messy dependency. The separation does improve the case.

What is wrong with a separate repository?

In fact, there is no big problem. However the development becomes inconvenient. You could argue that we have workarounds, but why not make the development easier?

  1. We always need to ensure the version of two go modules. Cherry-picking becomes troublesome since the import version must be a specific branch of pingcap/parser.

  2. It is a mysql parser, but is closely related to TiDB(basically everything except ast/mysql packages). You must code TiDB logic into the “standalone mysql parser” first before your real implementation, which means at least two PRs, even for small changes.

  3. Before the PR of parser merged, the PR of TiDB must use module directive to redirect a specific version of parser to fix CI.

Problems above have defeated the previous argument only go.mod need to change accordingly. We only separated code into two repositories, actual work did not decrease. However, our development experience does getting worse.

How do we do?

Obviously, we could move back to the old good way. But it must be a go sub-module, because:

  1. To keep compatibility. Parser has been imported by other tools and other users.
  2. To keep the clean dependency. That is why it is separated. We don’t want to make things worse again.

The detailed plan:

  1. Create parser directory in pingcap/tidb, which is synchronized with pingcap/parser, with replace github.com/pingcap/parser => ./parser.
  2. Stop sending new PRs and issues to pingcap/parser, new PRs should be directly sent to pingcap/tidb. And merge recent PRs in pingcap/parser as much as possible. Changes will be synchronized from pingcap/parser to pingcap/tidb, of course.
  3. Create wrapper packages for pingcap/parser, which re-exports things from pingcap/tidb/parser. I mean something like import ( . github.com/pingcap/tidb/parser/xxx), create a dummy function will eliminate the error of not used import. All code of pingcap/parser will be removed in this step.
  4. Announce the deprecation of pingcap/parser to possible users. Clean up old issues and PRs, create it in pingcap/tidb again, close it, or just forget about it.
  5. Archive pingcap/parser after, e.g. 4 months, i.e. two dev cycles. It mainly depends on users of pingcap/parser.

CI is not a problem. pingcap/parser only has unit tests, and integration tests make gotest of pingcap/tidb.

And by @hundundm, since sub-module is merely another module inside another module, we need to release it separately from the main module. That means our release team needs some extra work, which can be solved by a script: thanks to the identical release cycle/version between parser and tidb.

9 Likes

Just for a record, client-go is now dependent on parser (because it uses the error type defined by parser/terror, and we do have plans underway to resolve this dependency). We need to take care of the client-go dependency before we can merge parser to tidb.

It is a painless migration solution, as long as you agreed to migrate to the new import path eventually. One should not need to do anything before the final deprecation step.

  1. To keep compatibility. Parser has been imported by other tools and other users.

1 of the reasons parser was moved out of tidb repo, is the other projects that is dependent with the parser do not want to import the whole tidb module. Is it possible to solve it after moving the parser back?

It is a sub-module, which means the dependency is separated from the main module. It is like what etcd has done in Modularaization of etcd.

Futhermore, 1.17 has automatic dependency prune, which means we don’t even need to make it a sub-module. Just upgrade golang, and magic happens.

Thanks, I knew little about this before.

Changes will be synchronized from pingcap/parser to pingcap/tidb, of course.

Can it be done automatically?

Can it be done automatically?

Not worthy to automatically sync, maybe. If we stop sending new issues/PRs to parser, it is as simple as git pull or cp -a. There are only 22 opened PRs in parser.

And after we done step 3, synchronization is automatical. Because pingcap/parser simply redirects to pingcap/tidb/parser.

I like this proposal, which removes friction in developing TiDB, without adding any friction to other depending projects.

Maybe we can also remove the generated files from the repository as well?
Like this https://github.com/mjonss/parser/commit/2660281518f92ee6a62577e0c1b0d1083922de09

No. Then anyone importing pingcap/parser needs to generate the parser first. Refer to https://stackoverflow.com/questions/27402956/is-it-possible-use-go-build-with-extra-build-steps. Go lacks something like postbuild, preinstall in nodejs, or build scrits in rust.

That is not a big deal for us, but will be a problem for others. It does break compatibility in some aspects.

EDIT: we could upload the generated file only for released tarballs. That is what GNU autotools projects will do: generate ./configure.sh before releasing tarballs. Or you need to generate on your own by ./autogen.sh…

One problem is that TiDB did not maintain the semver of golang module, which means importing the correct commit version with generated parser is not trivial.

1 Like

+1 for this proposal

I have actually found myself trying to take shortcuts and not make changes that require pingcap/tidb and pingcap/parser changes combined (such as creating a new const for a string). This is not good for code quality; and I can’t be alone.

1 Like

kind of like java maven multiple-modules?

kind of like java maven multiple-modules?

Yeah, specifically, like etcd.

How does this change affect the library using pingcap/parser?
For simple cases I guess changing one line in go.mod would do, but we do have wired situations.
For example, I have a tool whose dependency relation looks like below and that really hurts my brain.
pkg -> pingcap/parser -> pingcap/tidb
pkg -> forked tidb -> other pingcap/parser (-> other other pingcap/tidb)

Would this migration help to resolve such issue?

How does this change affect the library using pingcap/parser?

I decided to make it painless, meaning you mostly don’t need to do anything before the final deprecation.

but we do have wired situations.

Even for the weird cases, I don’t want you to do anything.

How does this change affect the library using pingcap/parser?

But yes, you will need to migrate the import path of pingcap/parser to pingcap/tidb/parser in some commits before the final deprecation.

For everyone subscribed to the thread:

Now, there is an RFC on github. Please move to github for comments, the link is here . The RFC will be seen as accepted, once it is LGTM3 and there is no objection in 3 days.