TiKV X JuiceFS On The Go

SandyX · July 8, 2021, 2:03am

Hello everyone! This is Sandy from JuiceFS team. We are developing a new metadata engine for JuiceFS based on TiKV. Here is some background information.

JuiceFS is a distributed POSIX file system specially optimized for cloud-native environment. In JuiceFS, data is persisted in object storage (e.g. Amazon S3), and metadata can be stored in various databases such as Redis, MySQL or TiDB. JuiceFS is used in a lot of scenarios like big data analytics, machine learning, shared storage, etc. Below is the an overview of the system architecture:

Metadata engine is a critical component for distributed file systems. JuiceFS chooses Redis as the first one, considered that it has good performance and has been widely used in many organizations. However, Redis is not suitable for scenarios requiring high reliability or storing more than billions of files. Thus, a SQL interface (supporting SQLite, MySQL, TiDB, PostgreSQL, etc.) was developed as the second metadata engine. Although TiDB is a great choice for users who prefer reliability and scalability, we believe TiKV should have the same superiorities, but simpler architecture and higher performance.

Currently the new metadata engine is under development, more information can be found here. Any contribution and discussion are welcomed, you may leave comments under the thread or contact us directly.

SandyX · July 12, 2021, 8:03am

The first PR has been merged! After some basic tests and benchmarks, we saw the POWER of TiKV. Here is the result:

It shows time cost (us) for each operation, smaller is better
Number in parentheses is the multiple of Redis-always cost
Redis appendfsync configuration:
- Always: fsync after each commit
- Everysec: fsync every second

Note: Redis & MySQL have only 1 replica of data (local storage) while TiKV has 3 replicas (raft group)

	Redis-always	Redis-everysec	MySQL	TiKV
mkdir	968	704 (0.7)	2368 (2.4)	2174 (2.2)
mvdir	1067	912 (0.9)	3708 (3.5)	2315 (2.2)
rmdir	976	783 (0.8)	2965 (3.0)	2469 (2.5)
readdir_10	370	353 (1.0)	1322 (3.6)	1087 (2.9)
readdir_1k	1832	1818 (1.0)	15295 (8.3)	6688 (3.7)
mknod	978	685 (0.7)	2307 (2.4)	2187 (2.2)
create	919	681 (0.7)	2333 (2.5)	2118 (2.3)
rename	1030	887 (0.9)	3722 (3.6)	2328 (2.3)
unlink	933	701 (0.8)	3370 (3.6)	2354 (2.5)
lookup	137	115 (0.8)	407 (3.0)	634 (4.6)
getattr	121	110 (0.9)	371 (3.1)	322 (2.7)
setattr	606	440 (0.7)	1282 (2.1)	1883 (3.1)
access	124	112 (0.9)	363 (2.9)	317 (2.6)
setxattr	238	113 (0.5)	1185 (5.0)	1659 (7.0)
getxattr	109	109 (1.0)	340 (3.1)	314 (2.9)
removexattr	250	118 (0.5)	868 (3.5)	2007 (8.0)
listxattr_1	116	105 (0.9)	349 (3.0)	316 (2.7)
listxattr_10	117	115 (1.0)	404 (3.5)	334 (2.9)
link	712	569 (0.8)	2713 (3.8)	2117 (3.0)
symlink	978	682 (0.7)	2646 (2.7)	2141 (2.2)
newchunk	238	107 (0.4)	1 (0.0)	1 (0.0)
write	822	568 (0.7)	3256 (4.0)	2335 (2.8)
read_1	0	0 (0.0)	0 (0.0)	0 (0.0)
read_10	0	0 (0.0)	0 (0.0)	0 (0.0)

skyzh · July 12, 2021, 3:13pm

Look great! I’m curious about the latency of each operation. Do you have any benchmark result about that?

SandyX · July 13, 2021, 2:11am

You mean the latency of TiKV operations? No I don’t have that. Is there any way to get TiKV internal statistics?

skyzh · July 14, 2021, 3:16am

Maybe this could be measured at application side. e.g. How long does each readdir op takes if there are multiple operations running concurrently.

SandyX · July 15, 2021, 5:31am

Well we don’t have those details for now. The result showed above is obtained by golang benchmark test, and only average latencies are recorded.

SandyX · July 21, 2021, 3:26am

TiKV as metadata engine for JuiceFS is now fully supported. It passed all tests in pjdfstest and provides a bit better performance than MySQL. The latest benchmark results will be recorded in this doc.

BusyJay · August 9, 2021, 1:29pm

TiKV keeps improving its performance. 5.1.1 separates read/write ready, which can introduce some improvment. 5.3 (2 months later) will introduce async raft, which has more significant improvement (especially on ordinary disk).

SandyX · August 16, 2021, 2:28am

Sounds great! We’ll keep an eye on new features.

zz-jason · August 17, 2021, 8:46am

Do you meet any problems in integrating TiKV into JuiceFS? For example, performance tuning, unexpected behavior, etc.

SandyX · August 23, 2021, 3:09am

For now everything is smooth
We haven’t done much performance tuning yet. Currently 1PC & AsyncCommit are enabled before committing (check here), while other configurations are all default.

zjs1224522500 · November 16, 2021, 6:32am

Great work! I’m curious about why to choose to use the key-value database as the metadata store engine. In traditional fs, the directory tree is more popular for metadata management. Have you compared these two ways of metadata management? The key-value semantics seems not appropriate for the description of the file system directories and files, but the key-value database may perform well in bandwidth and latency while causing some other problems.

Do you have any more details about the comparison of these two solutions?

SandyX · November 17, 2021, 4:26am

The file system IS organized as directory tree, in which every node is managed by several key-value entries. For example, a regular file may have:

a dentry: {parent inode, file name} --> {file inode, file type}
an inode info: {file inode} --> {encoded file attributes}
several chunks: {file inode, chunk ID} --> {encoded chunk infos}

You can find more information here: https://github.com/juicedata/juicefs/blob/main/pkg/meta/tkv.go#L176-L199