Some questions about TiKV Coprocessor

mapleFU · July 14, 2021, 3:13am

Hi, I’m getting start with reading the code about TiKV coprocessor, and there are some naive questions:

Since TiKV is now using Chunk to optimize memory and query, in Chunk’s data structure, there is:

/// A container stores an array of datums, which can be either raw (not decoded), or decoded into
/// the `VectorValue` type.
///
/// TODO:
/// Since currently the data format in response can be the same as in storage, we use this structure
/// to avoid unnecessary repeated serialization / deserialization. In future, Coprocessor will
/// respond all data in Chunk format which is different to the format in storage. At that time,
/// this structure is no longer useful and should be removed.
#[derive(Clone, Debug)]
pub enum LazyBatchColumn {
    Raw(BufferVec),
    Decoded(VectorValue),
}

I have go through the code of some scanners, and it seems that Raw(BufferVec) is not used except testing or benchmarking. Do I catch it?

When I go through the IndexScanExecutor, I found some logics about cluster index. But I can’t open the design doc for it: https://docs.google.com/document/d/1Co5iMiaxitv3okJmLYLJxZYCNChcjzswJMRr-_45Eqg/edit?usp=sharing .

skyzh · July 14, 2021, 3:18am

Direct copy between serialized chunk data and chunk format vector is not implemented yet. There were some bugs and the PR is put aside

More efficient data encoding. Directly copy encoded chunk format data from memory to protobuf fields. There were some attempts, but failed due to some unknown bugs.

mapleFU · July 14, 2021, 3:42am

Oh, I found it! Sorry to ask a so stupid question!

skyzh · July 14, 2021, 3:52am

Good question indeed!