Proposal: Merge DM repo into TiCDC

Motivation

TiDB Data Migration (DM) is an integrated data migration task management platform, which supports data migration from MySQL-compatible databases (such as MySQL, MariaDB, and Aurora MySQL) into TiDB.

We would like to put DM code into TiCDC repo due to these reasons:

  1. DM has more similarities with TiCDC in processing logic, the main function of both tools is to synchronize in real time the specified type of upstream (MySQL, MariaDB, TiDB…) data flow to downstream (MySQL, MariaDB, TiDB, Kafka…) . Currently, the two tools are independent and have independent implementations for upstream and downstream data interfaces and internal data flow logic processing. So after merging these two repos, it will help code integration and reduce maintenance costs.
  2. The merge of DM and TiCDC can help reduce the product concepts and provide a unified data platform for users to use and reduce the cost for users of learning to use.

Steps

Steps below migrates minimal DM codebase to TiCDC, which allow DM build and release from the TiCDC repository. (Even not the latest version.)

  1. git checkout -b clone-dm && git subtree add -D dm https://github.com/pingcap/dm.git

  2. git checkout -b merge-dm (to create a diff branch, so the PR would be easier to review…)

  3. Do necessary merging.

    1. merge go.mod, go.sum
    2. update Makefile,
    3. update the import path for both TiCDC and DM.
    4. merge CI scripts.
  4. Create a PR from merge-br targeting clone-br for reviewing.

  5. After reviewing, merge it to master.

Risks

The change in TiCDC is add a new directory dm, so the risk is under control.

Reference

Proposal: Merge BR repo into TiDB

1 Like

Does it affect usage? It seems we use DM as a big public cluster, and take TiCDC as a component of tidb-cluster.

I think the repository name is not a big deal, we are thinking of a new repository name for the codebase. The codebase will contain the DM/CDC products.