◄ prev contents next ►

⚠

The web version only has simple instructions since chapter 04, while the full book has detailed explanations and background info.

0202: Table Schema

Structs

Table name, column names, column types, primary key:

type Schema struct {
    Table string
    Cols  []Column
    PKey  []int // which columns are the primary key?
}
type Column struct {
    Name string
    Type CellType
}

For example, this table:

create table `link` (
    `time` int64 not null,
    `src` string not null,
    `dst` string not null,
    primary key (`src`, `dst`)
);

Is represented as:

    schema := &Schema{
        Table: "link",
        Cols: []Column{
            {Name: "time", Type: TypeI64},
            {Name: "src", Type: TypeStr},
            {Name: "dst", Type: TypeStr},
        },
        PKey: []int{1, 2}, // (src, dst)
    }

Primary Key

The primary key is the unique ID of a row. It is made of 1 or more columns. To operate on a row, you must first find it by the primary key. So the primary key is the K in KV, and non-primary-key columns are stored in V. This lets a KV store implement a relational database (OLTP type).

Besides primary keys, indexes are also built with KV. An index lets you find the primary key K, then use it to get V. Like a book: the page number is the primary key, while the “table of contents” and the “index” are indexes. They help you find the page number, then the content. In some cases, an index already has enough data, so V does not need to be read, like reading only the table of contents.

Both primary keys and indexes are KV. The difference is that the primary key is required. So indexes are also called secondary indexes, and the primary key is called the primary key. Some databases, like SQLite, allow tables without a user-defined primary key. This is because SQLite tables have a hidden auto-generated primary key. User-visible primary keys are just normal indexes, with no real difference between primary and secondary.

Encode a Row as KV

A row in the database is represented as:

type Row []Cell

func (schema *Schema) NewRow() Row {
    return make(Row, len(schema.Cols))
}

Schema.NewRow() returns a row of the right size, not initialized.

Next, implement serialization and deserialization:

func (row Row) EncodeKey(schema *Schema) (key []byte)
func (row Row) EncodeVal(schema *Schema) (val []byte)
func (row Row) DecodeKey(schema *Schema, key []byte) (err error)
func (row Row) DecodeVal(schema *Schema, val []byte) (err error)

Requirements:

Input or output rows must match the schema.
Use Cell.Encode() and Cell.Decode() from the previous step.
Follow the column order defined in the schema.

Key Prefix

A database has many tables, so encoded keys need a prefix to tell tables apart. We’ll use the table name:

func (row Row) EncodeKey(schema *Schema) (key []byte) {
    key = append([]byte(schema.Table), 0x00)
    // TODO
}

A 0x00 separator is added after the table name to avoid key conflicts from name prefixes. For example, table names ab and abc have prefixes "ab\x00" and "abc\x00".

In practice, you can assign an integer ID as the prefix. This saves space and lets you rename tables.

ⓘ

CodeCrafters.io has similar courses in many programming languages, including build your own Redis, SQLite, Docker, etc. It’s worth checking out.

◄ prev contents next ►