Discuss: Public Design

tison · September 27, 2021, 3:18pm

What is the problem?

There are two problems we met:

There are a branch of features designed privately and implemented without public discussion. There is no truth but only tribal knowledge among those features. For example, Raft Async IO, Cardinality Estimation Enhancement, SPM Enhancement, and so on.
The lack of sharing design and public discussion hurts open source collaboration - contributors can hardly cooperate widely and smoothly, without necessary knowledge and information. For example, when @tangwz tried to understand and potentially participant Cardinality Estimation Enhancement work, the answer is “we discuss privately and develop in close”.

What is proposed to do?

I propose we stick to the design process documented in our development guide and the repository.

The checkpoint is that the final design document checked in the GitHub repository. We do not require the whole story happens on GitHub - you can always have conversations roughly before create a proposal, start a pre-RFC on internals.tidb.io or even a public Google Docs (in English), etc. But you should bring all previous discussions on GitHub for the final design document, squash to a result and keep a link to the original discussion.

Call for Help: What are examples we can learn from?

The proposal above is clear but still far away from practices. I’d like to ask for your helps to collect best design examples which we can learn from and create our own excellent designs.

To narrow the scope of design, let’s focus on database, operator system, compiler, and network.

Current example list (linked to the reply below):

Rationale

It’s a natural requirement on running an open-source community that we can track any feature, enhancement, and bugfix lifecycle.

We’re now creating a tracking issue for each task, which is the root of implementation. But still don’t have a strong requirement for “substantial” changes RFCs, which is the root of design.

Any contributor should be able to learn a feature by its design discussions and implementation steps.

feitian124 · September 14, 2021, 4:07am

I think restructure tests and its design document is a good example.

It maybe simple, but the process is clear enough.

zhangyangyu · September 14, 2021, 3:57am

I am experienced in Python community and I highly recommend its proposal strategy. Even after 10 or more years, I can still get all points I want from PEP (including background, discussions, all proposed ideas).

The enhancement proposal process for Python is described in detail in PEP 1. The brief stages of an enhancement lifecycle is:

Discuss the idea in python-list@python.org or python-ideas@python.org.
If the reactions from the community are positive, write a draft PEP and publish it to python-dev@python.org for discussion.
Collect community feedback and adapt the PEP.
Submit the final version for review and wait for the call.

All the communication in the process is public and can be looked up in the future by anyone.

Successful PEPs usually contain some common parts: What belongs in a successful PEP?. Take PEP 567 as an example. Reading it gives the readers a clear description why the proposal is needed(Rational) and how should it be(Introduction,Specification,Implementation,Summary of New APIs). These give the feature implementors and users a clear vision what the feature should be. Beyond that, the PEP also gives the discussion results(Rejected Ideas) during the PEP discussion stage and the feature influence(Backwards Compatibility). If you want to know more discussion details or related knowledge and background, you can then follow links in the References section. In short, in a PEP, you can get all the information about What is the feature and its influence, Why is the feature rational, How to implement and use the feature .

tison · September 14, 2021, 4:02am

I’d nominate “FLIP-76: Unaligned Checkpoints” a good example.

FLIP-76: Unaligned Checkpoints

This design document is for implementing a new checkpoint strategy for Apache Flink.

It starts with several quick private discussion around component experts to get the idea whether to go and what to do.

Soon after they’re convinced on the direction, Arvid Heise starts a discussion on the mailing list.

[DISCUSS] FLIP-76: Unaligned checkpoints

It attracts other contributors interested in the concept participant the discussion and reach consensus by a voting thread.

[VOTE] [FLIP-76] Unaligned checkpoints

For the design document itself, it is layout with:

Motivation (the reason we work for a feature is essential)
Preliminary work (POC and rough benchmark)
Proposed Changes (very detailed in every perspective affected)
Compatibility, Deprecation, and Migration Plan (fundamental software must take care of public interfaces)
Known Limitations
Test Plan (we need test coverage, definitely)
RoadMap (an implementation plan)

Later, it is implemented with tracking in an umbrella issue.

FLINK-14551 Unaligned checkpoints

BusyJay · September 14, 2021, 9:03am

+1 for this, I hope to see more proposals in tikv/rfcs before developing.

Hoshea · September 14, 2021, 11:56am

I’d nominate “The first design of Satellite 0.1.0” a good example, which is from Apache SkyWalking community.

The design document is about implementing a light-weight sidecar that collect metrics, traces, and logs.

I was deeply attracted by its really detailed introduction and clear figures. The design doc is just like implementing the project in natural language!

I think a good design document is like reading poetry, and you will have a clear idea about how to accomplish it step by step once you have read it.

bb7133 · September 14, 2021, 1:48pm

I would like to talk about the RFCs of CockroachDB, and nominate “version migration” as one of the good example.

It described a solution for dealing with incompatible-changes for in-place upgrade of a cluster, with a lot of details and explainations.

I think what we can learn from it is that it’s community friendly: it provides both guide-level and reference-level explaination, as well as the ‘recap’, so that contributors that know little background is able to understand the design.

Also, as TiDB is facing the similar problem, IMHO we can learn a lot from the solution of CRDB.

bb7133 · September 14, 2021, 3:25pm

I would also recommend the ‘placement rule in SQL’ RFC of TiDB(authored by @djshow832 and @morgo) , which includes:

a brief but clear description of the motivation.
a detailed design with a user-oriented interface and internal implementation.
the examples of the use-cases.
the alternatives, risks and limitations.

IMHO the only concern is it is complicated for engineers who are not familiar with TiDB architecture, but I believe this is partly because the feature IS complicated.

mjonss · September 16, 2021, 11:56pm

As a reference in MySQL, features are developed as Work Logs and are first designed before the implementation is done. The result can be seen in https://dev.mysql.com/worklog/ after the worklog has been released in a MySQL release. We can get some inspiration from the format etc. but we should be open to the community while deciding on the design.

tison · September 18, 2021, 1:03pm

Thanks for your support! Would you propose other example we the community can learn from?

morgo · September 24, 2021, 3:48am

A couple of things I like about the MySQL worklogs:

They are generally less verbose (not always a good thing; but some of our proposals are very detailed which makes them hard to read).
They consider what testing will be required up front as a checklist.

What I don’t like about them:

There are no user stories. Sometimes the goal of the feature is not clear (and when it comes to implementation, understanding the user-story/use case is very useful if any tradeoffs or limitations need to be considered).

As a more general comment, one thing I’ve noticed about a lot of proposals is I end up asking “what about X”, and the proposer says “that’s important, but out of scope for this proposal” (Recent Example).

Maybe it would be nice to add a “Goals and Non Goals” section (as bullet points) in the proposal introduction?

tison · October 8, 2021, 3:16am

Let me try to add it today or tomorrow.

UPDATE: Estimate exceeds, but I’ll keep a note for this

feitian124 · September 27, 2021, 11:18am

this is good, its tech design is clear, sub tasks are carefully tracked,
i believe one can easily finger out why, when, who, and how temporary table is implemented.