Some questions about TiKV Coprocessor

Hi, I’m getting start with reading the code about TiKV coprocessor, and there are some naive questions:

  1. Since TiKV is now using Chunk to optimize memory and query, in Chunk’s data structure, there is:
/// A container stores an array of datums, which can be either raw (not decoded), or decoded into
/// the `VectorValue` type.
///
/// TODO:
/// Since currently the data format in response can be the same as in storage, we use this structure
/// to avoid unnecessary repeated serialization / deserialization. In future, Coprocessor will
/// respond all data in Chunk format which is different to the format in storage. At that time,
/// this structure is no longer useful and should be removed.
#[derive(Clone, Debug)]
pub enum LazyBatchColumn {
    Raw(BufferVec),
    Decoded(VectorValue),
}

I have go through the code of some scanners, and it seems that Raw(BufferVec) is not used except testing or benchmarking. Do I catch it?

  1. When I go through the IndexScanExecutor, I found some logics about cluster index. But I can’t open the design doc for it: https://docs.google.com/document/d/1Co5iMiaxitv3okJmLYLJxZYCNChcjzswJMRr-_45Eqg/edit?usp=sharing .

Direct copy between serialized chunk data and chunk format vector is not implemented yet. There were some bugs and the PR is put aside :frowning:

  • More efficient data encoding. Directly copy encoded chunk format data from memory to protobuf fields. There were some attempts, but failed due to some unknown bugs.
1 Like

Oh, I found it! Sorry to ask a so stupid question!

Good question indeed!