The web version only has simple instructions since chapter 04, while the full book has detailed explanations and background info.

0602: Query SSTable

Reading an SSTable

The previous step effectively serialized []struct{ key []byte; val []byte } to disk. This format can be used directly without deserialization and does not need to be fully loaded into memory. Now we implement reading the n-th KV pair from the array:

type SortedFile struct {
    FileName string
    fp       *os.File
    nkeys    int // added
}

func (file *SortedFile) index(pos int) (key []byte, val []byte, err error) {
    var buf [8]byte
    if _, err = file.fp.ReadAt(buf[:], int64(8+8*pos)); err != nil {
        return nil, nil, err
    }
    // KV offset
    offset := int64(binary.LittleEndian.Uint64(buf[:]))
    if int64(8+8*file.nkeys) > offset {
        return nil, nil, errors.New("corrupted file")
    }
    // read KV
    if _, err = file.fp.ReadAt(buf[:], offset); err != nil {
        return nil, nil, err
    }
    // TODO
}

ReadAt() reads data from a specific file offset, the sibling of WriteAt().

Seek and iterate

Using index(), we then implement binary search and iterators. Because data is on disk, IO errors may occur, so error is returned.

func (file *SortedFile) Seek(key []byte) (SortedKVIter, error) {
    // ...
    iter := &SortedFileIter{file: file, pos: pos}
    // ...
    return iter, nil
}

From now on, prefer returning interfaces instead of concrete types, which avoids tight decoupling. The iterator implementation is similar to the existing KVIterator:

type SortedFileIter struct {
    file *SortedFile
    pos  int
    key  []byte
    val  []byte
}
func (iter *SortedFileIter) Valid() bool {
    return 0 <= iter.pos && iter.pos < iter.file.nkeys
}
func (iter *SortedFileIter) Key() []byte { return iter.key }
func (iter *SortedFileIter) Val() []byte { return iter.val }
func (iter *SortedFileIter) Next() error
func (iter *SortedFileIter) Prev() error

Errors that occur while reading KVs are returned via Prev() and Next(). So after moving pos, the KV must be read and stored in the iterator.

CodeCrafters.io has similar courses in many programming languages, including build your own Redis, SQLite, Docker, etc. It’s worth checking out.