Discuss: Public Design

What is the problem?

There are two problems we met:

  1. There are a branch of features designed privately and implemented without public discussion. There is no truth but only tribal knowledge among those features. For example, Raft Async IO, Cardinality Estimation Enhancement, SPM Enhancement, and so on.

  2. The lack of sharing design and public discussion hurts open source collaboration - contributors can hardly cooperate widely and smoothly, without necessary knowledge and information. For example, when @tangwz tried to understand and potentially participant Cardinality Estimation Enhancement work, the answer is ā€œwe discuss privately and develop in closeā€.

What is proposed to do?

I propose we stick to the design process documented in our development guide and the repository.

The checkpoint is that the final design document checked in the GitHub repository. We do not require the whole story happens on GitHub - you can always have conversations roughly before create a proposal, start a pre-RFC on internals.tidb.io or even a public Google Docs (in English), etc. But you should bring all previous discussions on GitHub for the final design document, squash to a result and keep a link to the original discussion.

Call for Help: What are examples we can learn from?

The proposal above is clear but still far away from practices. Iā€™d like to ask for your helps to collect best design examples which we can learn from and create our own excellent designs.

To narrow the scope of design, letā€™s focus on database, operator system, compiler, and network.

Current example list (linked to the reply below):

Rationale

Itā€™s a natural requirement on running an open-source community that we can track any feature, enhancement, and bugfix lifecycle.

Weā€™re now creating a tracking issue for each task, which is the root of implementation. But still donā€™t have a strong requirement for ā€œsubstantialā€ changes RFCs, which is the root of design.

Any contributor should be able to learn a feature by its design discussions and implementation steps.

5 Likes

:+1:

I think restructure tests and its design document is a good example.

It maybe simple, but the process is clear enough.

I am experienced in Python community and I highly recommend its proposal strategy. Even after 10 or more years, I can still get all points I want from PEP (including background, discussions, all proposed ideas).

The enhancement proposal process for Python is described in detail in PEP 1. The brief stages of an enhancement lifecycle is:

  1. Discuss the idea in python-list@python.org or python-ideas@python.org.
  2. If the reactions from the community are positive, write a draft PEP and publish it to python-dev@python.org for discussion.
  3. Collect community feedback and adapt the PEP.
  4. Submit the final version for review and wait for the call.

All the communication in the process is public and can be looked up in the future by anyone.

Successful PEPs usually contain some common parts: What belongs in a successful PEP?. Take PEP 567 as an example. Reading it gives the readers a clear description why the proposal is needed(Rational) and how should it be(Introduction,Specification,Implementation,Summary of New APIs). These give the feature implementors and users a clear vision what the feature should be. Beyond that, the PEP also gives the discussion results(Rejected Ideas) during the PEP discussion stage and the feature influence(Backwards Compatibility). If you want to know more discussion details or related knowledge and background, you can then follow links in the References section. In short, in a PEP, you can get all the information about What is the feature and its influence, Why is the feature rational, How to implement and use the feature .

Iā€™d nominate ā€œFLIP-76: Unaligned Checkpointsā€ a good example.

FLIP-76: Unaligned Checkpoints

This design document is for implementing a new checkpoint strategy for Apache Flink.

It starts with several quick private discussion around component experts to get the idea whether to go and what to do.

Soon after theyā€™re convinced on the direction, Arvid Heise starts a discussion on the mailing list.

[DISCUSS] FLIP-76: Unaligned checkpoints

It attracts other contributors interested in the concept participant the discussion and reach consensus by a voting thread.

[VOTE] [FLIP-76] Unaligned checkpoints

For the design document itself, it is layout with:

  1. Motivation (the reason we work for a feature is essential)
  2. Preliminary work (POC and rough benchmark)
  3. Proposed Changes (very detailed in every perspective affected)
  4. Compatibility, Deprecation, and Migration Plan (fundamental software must take care of public interfaces)
  5. Known Limitations
  6. Test Plan (we need test coverage, definitely)
  7. RoadMap (an implementation plan)

Later, it is implemented with tracking in an umbrella issue.

FLINK-14551 Unaligned checkpoints

+1 for this, I hope to see more proposals in tikv/rfcs before developing.

2 Likes

Iā€™d nominate ā€œThe first design of Satellite 0.1.0ā€ a good example, which is from Apache SkyWalking community.

The design document is about implementing a light-weight sidecar that collect metrics, traces, and logs.

I was deeply attracted by its really detailed introduction and clear figures. The design doc is just like implementing the project in natural language!

I think a good design document is like reading poetry, and you will have a clear idea about how to accomplish it step by step once you have read it.

I would like to talk about the RFCs of CockroachDB, and nominate ā€œversion migrationā€ as one of the good example.

It described a solution for dealing with incompatible-changes for in-place upgrade of a cluster, with a lot of details and explainations.

I think what we can learn from it is that itā€™s community friendly: it provides both guide-level and reference-level explaination, as well as the ā€˜recapā€™, so that contributors that know little background is able to understand the design.

Also, as TiDB is facing the similar problem, IMHO we can learn a lot from the solution of CRDB.

I would also recommend the ā€˜placement rule in SQLā€™ RFC of TiDB(authored by @djshow832 and @morgo) , which includes:

  • a brief but clear description of the motivation.
  • a detailed design with a user-oriented interface and internal implementation.
  • the examples of the use-cases.
  • the alternatives, risks and limitations.

IMHO the only concern is it is complicated for engineers who are not familiar with TiDB architecture, but I believe this is partly because the feature IS complicated.

4 Likes

As a reference in MySQL, features are developed as Work Logs and are first designed before the implementation is done. The result can be seen in https://dev.mysql.com/worklog/ after the worklog has been released in a MySQL release. We can get some inspiration from the format etc. but we should be open to the community while deciding on the design.

2 Likes

Thanks for your support! Would you propose other example we the community can learn from?

A couple of things I like about the MySQL worklogs:

  • They are generally less verbose (not always a good thing; but some of our proposals are very detailed which makes them hard to read).
  • They consider what testing will be required up front as a checklist.

What I donā€™t like about them:

  • There are no user stories. Sometimes the goal of the feature is not clear (and when it comes to implementation, understanding the user-story/use case is very useful if any tradeoffs or limitations need to be considered).

As a more general comment, one thing Iā€™ve noticed about a lot of proposals is I end up asking ā€œwhat about Xā€, and the proposer says ā€œthatā€™s important, but out of scope for this proposalā€ (Recent Example).

Maybe it would be nice to add a ā€œGoals and Non Goalsā€ section (as bullet points) in the proposal introduction?

2 Likes

Let me try to add it today or tomorrow.

UPDATE: Estimate exceeds, but Iā€™ll keep a note for this :cry:

this is good, its tech design is clear, sub tasks are carefully tracked,
i believe one can easily finger out why, when, who, and how temporary table is implemented.

1 Like