Error: failed to start tikv: failed to start

I’m trying to setup it on centos 7 & keep getting this error. Here is the log info

{"error": "operation timed out after 2m0s"}
{"task": "StartCluster", "error": "failed to start tikv: failed to start: <ip_address> tikv-20160.service, please check the instance's log(/tidb-deploy/tikv-20160/log) for more detail.: timed out waiting for port 20160 to be started after 2m0s", "errorVerbose": "timed out waiting for port 20160 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:119\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:151\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:405\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:534\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20220819030929-7fc1605a5dde/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1594\nfailed to start: <ip_address> tikv-20160.service, please check the instance's log(/tidb-deploy/tikv-20160/log) for more detail.\nfailed to start tikv"}
2023-02-14T20:38:20.613Z	INFO	Execute command finished	{"code": 1, "error": "failed to start tikv: failed to start: <ip_address> tikv-20160.service, please check the instance's log(/tidb-deploy/tikv-20160/log) for more detail.: timed out waiting for port 20160 to be started after 2m0s", "errorVerbose": "timed out waiting for port 20160 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:119\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:151\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:405\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:534\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20220819030929-7fc1605a5dde/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1594\nfailed to start: <ip_address> tikv-20160.service, please check the instance's log(/tidb-deploy/tikv-20160/log) for more detail.\nfailed to start tikv"}

When I installed it inside a centos 7 docker it was working just fine. but when I’m tying to use it on my server it’s giving this error. I’m using tiup to setup tidb

Follow what the log said:

please check the instance’s log(/tidb-deploy/tikv-20160/log) for more detail.

I have seen it but I can’t understand

image
I have opened the ports but it’s not showing in ss or in netstat

Check /tidb-deploy/tikv-20160/log/tikv.log on the node where the TiKV instance which failed to start is deployed.

the log

TiKV cannot find the OPTIONS file of RocksDB. The data directory may be corrupted.
Related codes here: https://github.com/tikv/tikv/blob/v6.5.0/components/engine_rocks/src/util.rs#L80

[2023/02/16 15:04:04.880 -06:00] [FATAL] [lib.rs:495] ["couldn't find the OPTIONS file"] [backtrace="   0: tikv_util::set_panic_hook::{{closure}}
             at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/tikv_util/src/lib.rs:494:18
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2032:9
      std::panicking::rust_panic_with_hook
             at /rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:692:13
   2: std::panicking::begin_panic::{{closure}}
             at /rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:608:9
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:137:18
   4: std::panicking::begin_panic
             at /rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:607:12
   5: engine_rocks::util::new_engine_opt::{{closure}}
             at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/engine_rocks/src/util.rs:80:32
      core::option::Option<T>::unwrap_or_else
             at /rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/option.rs:828:21
      engine_rocks::util::new_engine_opt
             at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/engine_rocks/src/util.rs:78:24
   6: tikv::server::engine_factory::KvEngineFactory::create_tablet
             at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/src/server/engine_factory.rs:156:25
   7: <tikv::server::engine_factory::KvEngineFactory as engine_traits::engine::TabletFactory<engine_rocks::engine::RocksEngine>>::create_shared_db
             at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/src/server/engine_factory.rs:224:22
   8: server::server::TikvServer<CER>::init_raw_engines
             at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/server/src/server.rs:1826:25
   9: server::server::run_impl
             at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/server/src/server.rs:162:35
      server::server::run_tikv
             at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/server/src/server.rs:197:5
  10: tikv_server::main
             at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/cmd/tikv-server/src/main.rs:210:5
  11: core::ops::function::FnOnce::call_once
             at /rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:513:5
      std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:121:18
  12: main
  13: __libc_start_main
  14: <unknown>
"] [location=components/engine_rocks/src/util.rs:80] [thread_name=main]

how can I fix this ?

If there is no data for production, just destroy the cluster and deploy again.

If you want to recover the node, and other replicas are healthy, I think you can try scale in & then scale out.
Refer to https://docs.pingcap.com/tidb/stable/scale-tidb-using-tiup

Otherwise, maybe you can try copy /db/OPTIONS-xxxx from other TiKV instance. But I’m not sure it will work. And there would be other corrupted files in addition.

Unsafe recovery will also be necessary if any region had lost majority. Please refer to https://docs.pingcap.com/tidb/stable/online-unsafe-recovery.

actually I was setting up a new node database

It has no data but whenever I’m trying to create a new cluster it’s giving me this error.

cluster.yaml

# For more information about the format of the tiup cluster topology file, consult
# https://docs.pingcap.com/tidb/stable/production-deployment-using-tiup#step-3-initialize-cluster-topology-file

# # Global variables are applied to all deployments and used as the default value of
# # the deployments if a specific deployment value is missing.
global:
  # # The OS user who runs the tidb cluster.
  user: "root"
  # # SSH port of servers in the managed cluster.
  ssh_port: 22
  # # Storage directory for cluster deployment files, startup scripts, and configuration files.
  deploy_dir: "/tidb-deploy"
  # # TiDB Cluster data storage directory
  data_dir: "/tidb-data"
  # # Supported values: "amd64", "arm64" (default: "amd64")
  arch: "amd64"

pd_servers:
  - host: 173.208.190.18

tidb_servers:
  - host: 173.208.190.18

tikv_servers:
  - host: 173.208.190.18

monitoring_servers:
  - host: 173.208.190.18

grafana_servers:
  - host: 173.208.190.18

Clean up /tidb-data of 173.208.190.18 and try again.