⚠
The web version only has simple instructions since chapter 04, while the full book has detailed explanations and background info.
0700: LSM-Tree Introduction
What is an LSM-Tree?
Log-structured Merge Tree and B+Tree are the only 2 choices to build a general-purpose DB. LSM-Tree can be compared with B+Tree, but it is not a concrete data structure. It is a method that replaces “updating” data structures with “merging”. The name is misleading, because its idea has nothing to do with trees or logs.
Our database have 2 levels: MemTable and SSTable. MemTable uses a log for durability, while SSTable is never modified and is only replaced by new files. For queries, results are merged, and the upper MemTable has higher priority. For updates, data goes to the upper level first, then to lower levels.
This 2-level structure can be extended to k levels, which is the LSM-Tree. What each level contains is irrelevant to the LSM-Tree principle. Each level only needs to be a set of sorted KV, which can be an array or a tree. If ordering is not required, it can even be a hashtable. So LSM-Tree is a collection of data structures, not a single one.
level 1 [x=1, z=3]
level 2 [a=8, y=2, z=(deleted)]
...
level k [a=0, b=2, c=3, d=4, z=456]
See the full book for a detailed explanation.
ⓘ
CodeCrafters.io has similar courses in many programming languages, including build your own Redis, SQLite, Docker, etc. It’s worth checking out.