⚠
The web version only has simple instructions since chapter 04, while the full book has detailed explanations and background info.
0403: Sort Orders
Primary Key Order
KV keys are compared as strings, using bytes.Compare(). Relational data has types, and serialized keys may compare in the wrong order.
One fix is to deserialize keys before comparing, then compare by data type. But this makes KV depend on schema, which adds tight coupling. So some databases use another way: design a serialization format that keeps sort order.
- Rename the old
Cell.Encode()toCell.EncodeVal()for KV values. - Add
Cell.EncodeKey()for KV keys which implements a new format. - Update
Cell.Decode()accordingly.
// New
func (cell *Cell) EncodeKey(toAppend []byte) []byte
func (cell *Cell) DecodeKey(data []byte) (rest []byte, err error)
// Existing
func (cell *Cell) EncodeVal(toAppend []byte) []byte
func (cell *Cell) DecodeVal(data []byte) (rest []byte, err error)Rework Row methods to call the new Cell methods:
func (row Row) EncodeKey(schema *Schema) (key []byte)
func (row Row) DecodeKey(schema *Schema, key []byte) (err error)Order-Preserving Serialization
The following serialization method preserves sort order of int64. See the full book for the explanation.
func (cell *Cell) EncodeKey(out []byte) []byte {
switch cell.Type {
case TypeI64:
return binary.BigEndian.AppendUint64(out, uint64(cell.I64)^(1<<63))
case TypeStr:
// TODO
}
}Floats can be serialized this way too, but note that negative floats differ from integers (see step 0201).
Null-Terminated Strings
Encode strings as null-terminated strings, and use the following escaping scheme to eliminate null bytes. See the full book for the explanation.
0x00⇔0x01 0x010x01⇔0x01 0x02
func encodeStrKey(toAppend []byte, input []byte) []byte {
for _, ch := range input {
if ch == 0x00 || ch == 0x01 {
toAppend = append(toAppend, 0x01, ch+1)
} else {
toAppend = append(toAppend, ch)
}
}
return append(toAppend, 0x00)
}Implement the decode function:
func decodeStrKey(data []byte) (out []byte, rest []byte, err error)Multi-Column Compare
A primary key can have multiple columns. Comparison is like strings, column by column, like tuples in Python. For (a, b) > (c, d), it means a > c || (a == c && b > d). Question: if you serialize columns and join them, does this still work? The answer is yes.
ⓘ
CodeCrafters.io has similar courses in many programming languages, including build your own Redis, SQLite, Docker, etc. It’s worth checking out.