⚠
The web version only has simple instructions since chapter 04, while the full book has detailed explanations and background info.
0405: Range Query
KV Range Query
The Seek() function only supports open-ended range queries (key >= 123):
func (kv *KV) Seek(key []byte) (*KVIterator, error)Now add a close-ended range query API:
func (kv *KV) Range(start, stop []byte, desc bool) (*RangedKVIter, error)The desc flag controls the traversal order and can be used for ORDER BY:
desc == false: querystart <= key && key <= stop.desc == true: querystart >= key && key >= stop.
RangedKVIter just wraps the original KVIterator:
type RangedKVIter struct {
iter KVIterator
stop []byte
desc bool
}
func (iter *RangedKVIter) Key() []byte {
return iter.iter.Key()
}
func (iter *RangedKVIter) Val() []byte {
return iter.iter.Val()
}With a stop condition during iteration:
func (iter *RangedKVIter) Valid() bool {
if !iter.iter.Valid() {
return false
}
r := bytes.Compare(iter.iter.Key(), iter.stop)
if iter.desc && r < 0 {
return false
} else if !iter.desc && r > 0 {
return false
}
return true
}Next() must check the traversal order:
func (iter *RangedKVIter) Next() error {
if !iter.Valid() {
return nil
}
if iter.desc {
return iter.iter.Prev()
} else {
return iter.iter.Next()
}
}Range() also wraps Seek(). Implement it yourself. Note that you may need to adjust the initial iterator position, since Seek() is >=, while Range() can be >= or <=.
func (kv *KV) Range(start, stop []byte, desc bool) (*RangedKVIter, error) {
iter, err := kv.Seek(start)
if err != nil {
return nil, err
}
// TODO
}Prefix Comparison
Now add a similar Range() API to DB. A new issue appears: a primary key or index can consist of multiple columns, and a query may use only a prefix. For example, if the primary key is (a, b, c), these queries are valid:
- (a, b, c) ≤ (123, 4, 5), using the full key. See tuple comparison in step 0403.
- (a, b) ≤ (123, 4), using the (a, b) prefix of (a, b, c), equivalent to (a, b, c) ≤ (123, 4, +∞).
- a ≤ 123, using the (a, ) prefix of (a, b, c), equivalent to (a, b, c) ≤ (123, +∞).
If the prefix (123, 4) is encoded and used for K ≤ (123, 4) in KV, the actual result is (a, b, c) < (123, 4). So we introduce infinity and pad the prefix to a full key. All 4 prefix comparisons can be converted to full key comparisons:
| Prefix | Full |
|---|---|
| (a, b) < (1, 2) | (a, b, c) ≤ (1, 2, −∞) |
| (a, b) ≤ (1, 2) | (a, b, c) ≤ (1, 2, +∞) |
| (a, b) > (1, 2) | (a, b, c) ≥ (1, 2, +∞) |
| (a, b) ≥ (1, 2) | (a, b, c) ≥ (1, 2, −∞) |
If we include infinity in KV key encoding, prefix comparison becomes possible. Add a function to encode key prefixes:
func EncodeKeyPrefix(schema *Schema, prefix []Cell, positive bool) []byteThe positive flag decides whether the suffix is +∞ or −∞. Encode infinity as follows:
- Empty string means −∞, the smallest string. This requires serialized data to never be
"", which is already true. "\xff"means +∞, the max byte. This requires serialized data to never start with"\xff", which can be enforced by prepending 1 byte to each column.
func (row Row) EncodeKey(schema *Schema) []byte {
key := append([]byte(schema.Table), 0x00)
for _, idx := range schema.PKey {
cell := row[idx]
key = append(key, byte(cell.Type)) // avoid 0xff
key = cell.EncodeKey(key)
}
return append(key, 0x00) // > -infinity
}Since −∞ is encoded as an empty string, we need to avoid conflicts between (a, b) and (a, b, −∞) by appending a trailing 0x00 byte, which is greater than −∞.
func EncodeKeyPrefix(schema *Schema, prefix []Cell, positive bool) []byte {
key := append([]byte(schema.Table), 0x00)
for i, cell := range prefix {
key = append(key, byte(cell.Type)) // avoid 0xff
key = cell.EncodeKey(key)
}
if positive {
key = append(key, 0xff) // +infinity
} // -infinity
return key
}DB Range Query
Add a close-ended range query API that supports 4 comparison operators:
type ExprOp uint8
const (
OP_LE ExprOp = 12 // <=
OP_GE ExprOp = 13 // >=
OP_LT ExprOp = 14 // <
OP_GT ExprOp = 15 // >
)
type RangeReq struct {
StartCmp ExprOp // <= >= < >
StopCmp ExprOp
Start []Cell
Stop []Cell
}
func (db *DB) Range(schema *Schema, req *RangeReq) (*RowIterator, error)The traversal direction is decided by StartCmp: >= and > mean ascending, otherwise descending. Start and Stop can be full primary keys or prefixes, and their lengths do not need to match.
Replace the KVIterator in RowIterator with RangedKVIter:
type RowIterator struct {
schema *Schema
iter *RangedKVIter // replaced
valid bool
row Row
}The rest is left to the reader:
func (db *DB) Range(schema *Schema, req *RangeReq) (*RowIterator, error) {
start := EncodeKeyPrefix(schema, req.Start, suffixPositive(req.StartCmp))
stop := EncodeKeyPrefix(schema, req.Stop, suffixPositive(req.StopCmp))
desc := isDescending(req.StartCmp)
iter, err := db.KV.Range(start, stop, desc)
if err != nil {
return nil, err
}
// TODO
}ⓘ
CodeCrafters.io has similar courses in many programming languages, including build your own Redis, SQLite, Docker, etc. It’s worth checking out.