Discuss: Public Design

What is the problem?

There are two problems we met:

  1. There are a branch of features designed privately and implemented without public discussion. There is no truth but only tribal knowledge among those features. For example, Raft Async IO, Cardinality Estimation Enhancement, SPM Enhancement, and so on.

  2. The lack of sharing design and public discussion hurts open source collaboration - contributors can hardly cooperate widely and smoothly, without necessary knowledge and information. For example, when @tangwz tried to understand and potentially participant Cardinality Estimation Enhancement work, the answer is “we discuss privately and develop in close”.

What is proposed to do?

I propose we stick to the design process documented in our development guide and the repository.

The checkpoint is that the final design document checked in the GitHub repository. We do not require the whole story happens on GitHub - you can always have conversations roughly before create a proposal, start a pre-RFC on internals.tidb.io or even a public Google Docs (in English), etc. But you should bring all previous discussions on GitHub for the final design document, squash to a result and keep a link to the original discussion.

Call for Help: What are examples we can learn from?

The proposal above is clear but still far away from practices. I’d like to ask for your helps to collect best design examples which we can learn from and create our own excellent designs.

To narrow the scope of design, let’s focus on database, operator system, compiler, and network.

Current example list (linked to the reply below):


It’s a natural requirement on running an open-source community that we can track any feature, enhancement, and bugfix lifecycle.

We’re now creating a tracking issue for each task, which is the root of implementation. But still don’t have a strong requirement for “substantial” changes RFCs, which is the root of design.

Any contributor should be able to learn a feature by its design discussions and implementation steps.



I think restructure tests and its design document is a good example.

It maybe simple, but the process is clear enough.

I am experienced in Python community and I highly recommend its proposal strategy. Even after 10 or more years, I can still get all points I want from PEP (including background, discussions, all proposed ideas).

The enhancement proposal process for Python is described in detail in PEP 1. The brief stages of an enhancement lifecycle is:

  1. Discuss the idea in python-list@python.org or python-ideas@python.org.
  2. If the reactions from the community are positive, write a draft PEP and publish it to python-dev@python.org for discussion.
  3. Collect community feedback and adapt the PEP.
  4. Submit the final version for review and wait for the call.

All the communication in the process is public and can be looked up in the future by anyone.

Successful PEPs usually contain some common parts: What belongs in a successful PEP?. Take PEP 567 as an example. Reading it gives the readers a clear description why the proposal is needed(Rational) and how should it be(Introduction,Specification,Implementation,Summary of New APIs). These give the feature implementors and users a clear vision what the feature should be. Beyond that, the PEP also gives the discussion results(Rejected Ideas) during the PEP discussion stage and the feature influence(Backwards Compatibility). If you want to know more discussion details or related knowledge and background, you can then follow links in the References section. In short, in a PEP, you can get all the information about What is the feature and its influence, Why is the feature rational, How to implement and use the feature .

I’d nominate “FLIP-76: Unaligned Checkpoints” a good example.

FLIP-76: Unaligned Checkpoints

This design document is for implementing a new checkpoint strategy for Apache Flink.

It starts with several quick private discussion around component experts to get the idea whether to go and what to do.

Soon after they’re convinced on the direction, Arvid Heise starts a discussion on the mailing list.

[DISCUSS] FLIP-76: Unaligned checkpoints

It attracts other contributors interested in the concept participant the discussion and reach consensus by a voting thread.

[VOTE] [FLIP-76] Unaligned checkpoints

For the design document itself, it is layout with:

  1. Motivation (the reason we work for a feature is essential)
  2. Preliminary work (POC and rough benchmark)
  3. Proposed Changes (very detailed in every perspective affected)
  4. Compatibility, Deprecation, and Migration Plan (fundamental software must take care of public interfaces)
  5. Known Limitations
  6. Test Plan (we need test coverage, definitely)
  7. RoadMap (an implementation plan)

Later, it is implemented with tracking in an umbrella issue.

FLINK-14551 Unaligned checkpoints

+1 for this, I hope to see more proposals in tikv/rfcs before developing.


I’d nominate “The first design of Satellite 0.1.0” a good example, which is from Apache SkyWalking community.

The design document is about implementing a light-weight sidecar that collect metrics, traces, and logs.

I was deeply attracted by its really detailed introduction and clear figures. The design doc is just like implementing the project in natural language!

I think a good design document is like reading poetry, and you will have a clear idea about how to accomplish it step by step once you have read it.

I would like to talk about the RFCs of CockroachDB, and nominate “version migration” as one of the good example.

It described a solution for dealing with incompatible-changes for in-place upgrade of a cluster, with a lot of details and explainations.

I think what we can learn from it is that it’s community friendly: it provides both guide-level and reference-level explaination, as well as the ‘recap’, so that contributors that know little background is able to understand the design.

Also, as TiDB is facing the similar problem, IMHO we can learn a lot from the solution of CRDB.

I would also recommend the ‘placement rule in SQL’ RFC of TiDB(authored by @djshow832 and @morgo) , which includes:

  • a brief but clear description of the motivation.
  • a detailed design with a user-oriented interface and internal implementation.
  • the examples of the use-cases.
  • the alternatives, risks and limitations.

IMHO the only concern is it is complicated for engineers who are not familiar with TiDB architecture, but I believe this is partly because the feature IS complicated.


As a reference in MySQL, features are developed as Work Logs and are first designed before the implementation is done. The result can be seen in https://dev.mysql.com/worklog/ after the worklog has been released in a MySQL release. We can get some inspiration from the format etc. but we should be open to the community while deciding on the design.