The web version only has simple instructions since chapter 04, while the full book has detailed explanations and background info.

0103: Log Storage

Append-only logs

Like text logs, a database log only appends entries at the end of the file, and never modifies or deletes existing entries. Log entries record every update to the database. For example, a log with 4 records:

operation state
0 {}
1 set k1=x {k1=x}
2 set k2=y {k1=x, k2=y}
3 set k1=z {k1=z, k2=y}
4 del k2 {k1=z}

When the database starts, it reads the log and applies updates in order, producing the final state {k1=z}. This implements incremental updates. This is one reason to use a log. Other uses will appear later.

You may ask: how are old log records removed? This will be solved later: when the log reaches a certain size, it is merged into the main data structure (LSM-Tree or B+Tree).

The log records every state change. Even with concurrent transactions, as long as it is not a distributed system, state changes can be serialized in the log. So logs can be used for replication and rollback (undo). However, a log cannot grow forever, so it cannot be the main storage and is only auxiliary. A log is not strictly required; a database can also be built on copy-on-write data structures with log-like behavior.

Log records

From the example above, the log has 2 types of records: update and delete. So a flag is added to Entry to distinguish them.

type Entry struct {
    key     []byte
    val     []byte
    deleted bool
}

The serialization format becomes:

| key size | val size | deleted | key data | val data |
| 4 bytes  | 4 bytes  | 1 byte  |   ...    |   ...    |

Modify these 2 functions:

func (ent *Entry) Encode() []byte
func (ent *Entry) Decode(r io.Reader) error

Read and write the log file

Struct definition:

type Log struct {
    FileName string
    fp       *os.File
}

func (log *Log) Open() (err error) {
    log.fp, err = os.OpenFile(log.FileName, os.O_RDWR|os.O_CREATE, 0o644)
    return err
}

func (log *Log) Close() error {
    return log.fp.Close()
}

Log IO interfaces:

func (log *Log) Write(ent *Entry) error {
    _, err := log.fp.Write(ent.Encode())
    return err
}

func (log *Log) Read(ent *Entry) (eof bool, err error) {
    err = ent.Decode(log.fp)
    if err == io.EOF {
        return true, nil
    } else if err != nil {
        return false, err
    } else {
        return false, nil
    }
}

When the database starts, it repeatedly calls Log.Read() until the end of file. After that, each update calls Log.Write() to append to the end.

Log.Read() detects end of file using io.EOF (End Of File). So Entry.Decode() must propagate errors returned by io.Reader.

Reading the log

Add this struct to KV:

type KV struct {
    log Log
    mem map[string][]byte
}
func (kv *KV) Open() error {
    if err := kv.log.Open(); err != nil {
        return err
    }
    kv.mem = map[string][]byte{}
    // TODO
}

func (kv *KV) Close() error { return kv.log.Close() }

Implement KV.Open() to load the log into KV.mem. Requirement: later entries override earlier ones.

Writing the log

Modify set and del to add log writes:

func (kv *KV) Set(key []byte, val []byte) (updated bool, err error)
func (kv *KV) Del(key []byte) (deleted bool, err error)

Enter the db_project/0102 directory. Run tests:

go test .

Summary

The log persists incremental updates to disk. But this log still has problems. What happens if the database loses power during use? This is the next issue to address.

CodeCrafters.io has similar courses in many programming languages, including build your own Redis, SQLite, Docker, etc. It’s worth checking out.