⚠
The web version only has simple instructions since chapter 04, while the full book has detailed explanations and background info.
0301: Tokenizer
string → tokens → struct
An SQL string must be parsed into program data before it can be processed. For example:
select a,b from t where c=1;Will be represented as:
StmtSelect{
table: "t",
cols: []string{"a", "b"},
keys: []NamedCell{{column: "c", value: Cell{Type: TypeI64, I64: 1}}},
}SQL is similar to English, with its own words and grammar. In programming languages, words are called tokens. Before grammar parsing, the string is split into tokens. This step is called tokenizing or lexing.
SQL tokens can be grouped as:
- Keywords: select, from, etc.
- Names: table names, column names, etc.
- Symbols:
=;,etc. - Numbers, strings, etc.
Each type has different rules and is coded in different functions.
Syntax Parser
Most parsing works by consuming tokens from left to right and building data structures. So we need to track the current position in the string.
type Parser struct {
buf string
pos int
}
func NewParser(s string) Parser {
return Parser{buf: s, pos: 0}
}Parse Names (table names, column names)
func (p *Parser) tryName() (string, bool)Requirements:
- Skip leading spaces.
- First char is a letter or
_, following chars are letters, digits, or_. - On success, return true and advance
pos. - On failure, return false and keep
pos.
For example, input Parser {buf: " hi ", pos: 0}. After tryName(), pos = 3, return "hi".
Use these helpers:
func isSpace(ch byte) bool {
switch ch {
case '\t', '\n', '\v', '\f', '\r', ' ':
return true
}
return false
}
func isAlpha(ch byte) bool {
return 'a' <= (ch|32) && (ch|32) <= 'z'
}
func isDigit(ch byte) bool {
return '0' <= ch && ch <= '9'
}
func isNameStart(ch byte) bool {
return isAlpha(ch) || ch == '_'
}
func isNameContinue(ch byte) bool {
return isAlpha(ch) || isDigit(ch) || ch == '_'
}Parse Keywords
func (p *Parser) tryKeyword(kw string) boolRequirements:
- Skip leading spaces.
- Match the keyword, case-insensitive. On success, advance
posand return true. - Otherwise, return false.
- Keywords must be separated by space or punctuation.
Use this to detect separators:
func isSeparator(ch byte) bool {
return ch < 128 && !isNameContinue(ch)
}ⓘ
CodeCrafters.io has similar courses in many programming languages, including build your own Redis, SQLite, Docker, etc. It’s worth checking out.