Proposal: add issue number to PR title

Background

Currently, our ti-chi-bot will use the title of the PR as the message title of the commit generated after the squash merge, we can know this commit comes from which PR

However, because we maintain multiple release branches at the same time, PRs that fix the same problem have different PR numbers on different branches. We can’t directly know whether an issue has been fixed in the corresponding release branch through an issue number.

Content

So we propose that add the issue number to the PR title and check it when PR be opened so that we can easily query whether an issue has been fixed on a release branch through git log.

We can use the same issue to introduce in detail which problems the associated PR has solved, which new features have been implemented, and which modifications have been made so that we can get a better release note when we release a new version.

  • If an issue needs to submit multiple PRs to fix, the titles of these PRs can all be associated with this issue number

  • In multiple release branches, PRs that fix the same issue can be associated with the same issue number

  • For a PR to resolve the issue of another repository, it is recommended to create a new issue in the current repository, and make it one of the subtasks of the original issue

To be discussed

Question: What is the new PR Title Format?

Option 1:

The new title format of the PR can be as follows, enclose the associated issue number with a pair of [ and ] symbols and put it behind the title.

pkg: what's changed in this one package [123, 124]

After squash merge, the commit message will be:

pkg: what's changed in this one package [123, 124] (#14232)

The pattern of check regex can be: https://regex101.com/r/44eVG5/1

^.+: .{10,160}\s*\[[1-9]\d*(,\s*([1-9]\d*))*\]$

Option 2:

[issue-11123][issue-11124] pkg: what's changed in this one package

After squash merge, the commit message will be:

[issue-11123][issue-11124] pkg: what's changed in this one package (#14232)

Notice: This format will take up more length, especially when a PR references multiple issues.

Option 3:

[TI-11123][TI-11124] pkg: what's changed in this one package

After squash merge, the commit message will be:

[TI-11123][TI-11124] pkg: what's changed in this one package (#14232)

The regex can be: https://regex101.com/r/U6Hvcg/1

^(\[TI-[1-9]\d*\])+.+: .{10,160}$

Option 4:

[#11123][#11124] pkg: what's changed in this one package

After squash merge, the commit message will be:

[#11123][#11124] pkg: what's changed in this one package (#14232)

Notice: his format causes the commit to reference the issue.

Question: Should all the PR have associated issues and fill in the issue number to the title?

The current suggestion is that all the PR should be associated with at least one issue, because even some fix typo issues may have a larger impact, we also need to consider showing them on the release note.

Implementation plan

  • Reach consensus on the problem about PR title associate issue number
  • Modify the repository‘s PR template and CI Title Checker’s prompts, to make sure the newly opened PR gradually switch to the new title format
  • After a period of implementation, we will modify the regular expression of the title checker to make sure the title of subsequent PR is correct.

cc: @tison, @zhouqiang-cl, @zhangyangyu

Suggestions are welcome!

If a PR fixes multiple issues, how should the title format be defined?

I don’t think this is a problem. One PR should fix only one thing, this is a common sense. If the multiple issues refer to multiple things, contributors should not submit all the fixes in a single PR. If the multiple issues refer to the same thing, only one should be left open and the others should be closed as duplicate.

And I’m not clear about its consequence. In the migration period, the title will be checked and a warning message will be prompted to the users if title has no issue number. After migration, this requirement becomes mandatory and the issue can’t be merged if title is not valid. Right?

And will the issue number be checked for existence?

And I’m not clear about its consequence. In the migration period, the title will be checked and a warning message will be prompted to the users if title has no issue number. After migration, this requirement becomes mandatory and the issue can’t be merged if title is not valid. Right?

Yes, during the migration period, it will check whether the title contains the issue number and give a warning, although it is not correct, the CI will not fail. After the migration is completed, it will be used as one of the required conditions for the merge.

Actually, I would rather be able to reach a consensus through other forms of announcements, rather than warnings from robots.

And will the issue number be checked for existence?

We can check it through GitHub API.

title=$1
issue_number=$(echo $title | egrep -o "\[#\d+\]" | egrep -o "\d+")
result=$(
	curl -s \
	-H "Authorization: $TOKEN" \
	-H "Accept: application/vnd.github.v3+json" \
	"https://api.github.com/repos/pingcap/tidb/issues/$issue_number"
)

found_number=$(echo $result | egrep "\"number\": \d+,")
found_pr_field=$(echo $result | egrep "\"pull_request\":")

if [ -z "$found_number" ]
then
	echo "issue not found"
else
	if [ -n "$found_pr_field" ]
	then
		echo "you can't fill in pr number in issue number field"
	else
		echo "great! The title has issue number."
	fi
fi

Test result:

1 Like

Thanks for starting this discussion @Mini256 ! I have two concerns about how we apply this rule:

  1. Since we have issue number now, do you still have to require a package prefix? I think a regex ^\[#\d+\] .{20, 160}$ and check the issue number valid is enough.
  2. @zhangyangyu I totally agree that one PR SHOULD fix only one issue. However, TiDB code is currently tight coupled, which means we sometimes have to fix things a bit different in one pass. I’ll bring an example below. You may argue that we then do the refactor first and fix the problem one by one. If it is OK to our maintainers, I’m glad to see it’s happening.

For example, #28102 fixes #28075 and #28101. Of course, we can first fix the test or do migration and then the other. However, it requires timely review that we can do quick small fixes instead of one big fix.

Other examples may comes from planner and executor where we sometimes do several things in one PR since the code is coupled.

I tend to keep one PR related to one issue. But, it must be aware of our developers instead of pushing forward with little awareness.

One PR should fix only one thing, this is a common sense.

It is not true and too ideal.

For example https://github.com/pingcap/tidb/pull/27697 fixes #19025 and #18488.

The two issues are in different scenarios but one PR can fix both.

And we definitely have many other automatic ways to know whether a issue is fixed in a release branch without introducing extra man work of contributors.

For example, a bot can trace the corresponding PRs for an issue, and when a PR is merged into a release branch, it tags or edits the description of the issue.

2 Likes

It is hard to implement due to GitHub shortcoming.

Not really. I second your opinion about one PR MAY fix multiple issues due to the current situation. But a PR containing issue number is a common practice. We’re now in another situation that multiple PRs doesn’t associate with an issue that makes issue a secondary topic and it hurts development activities comprehension - we focus on code, not the problem and solution.

See also:

not really, maybe you can try graphql.

see

https://docs.github.com/en/graphql/reference/objects#closedevent

and

https://docs.github.com/en/graphql/reference/objects#:~:text=willCloseTarget%20(Boolean!),source%20is%20merged

2 Likes

I really want the CI bot can automatically add the closed issue id and url in the comment message, but I am reluctant to add it manually in the PR title, not only because it is cumbersome but also because it is ugly.

How does any automatic logic get aware of which issue to be associated with? Do you mean we search in the PR body and do the same thing we proposed to apply on PR title?

As mentioned above, in either way the author should associate a PR with an issue (or probably multiple). If so, check the title is more determined.

We’re not saying using a (new) mechanism to force something, but notice any contributor takes care of filing an issue first so that we don’t write code only but focus on the problem and its resolution.

So far, there are works without an issue (as below), while a PR focus more or less mainly on the code - they have difference points.

Thanks for your information! It will help :slight_smile:

if the PR body claims it will close certain issue, github will create an willCloseTarget event, which can be obtained using GraphQL api.

@Mini256 in this way, can we generate commit message with issue number? It convinces me on the message collecting and checking part.

@ichn-hu can you create an example querying the linked issue of a specific PR by GraphQL?

I already have that, see my previous issue tracker

We can get the PR reference event on the timeline of an issue through graphsql query, but what’s disturbing is that we can’t get the referenced issue by querying a PR conversely.

Tide (the component responsible for merging PR) obtains some PR information through a graphsql query before generating the commit message. We can use the information to fill in the commit message template, but the current problem is that we cannot get the issue associated with PR.

Amazing! It seems that GitHub has added this feature.

query { 
  repository (owner:"pingcap" name:"tidb") { 
    pullRequest(number: 28204) {
      closingIssuesReferences(first: 10) {
        nodes {
          number
        }
      }
    }
  }
}

Can tide populate issue number from PR body in commit message in your opinion?