Write-Your-Own-Lua-VM-Compiler-and-Standard-Library-Reading-Notes.md

Chapter 1, 准备工作

Chapter 2, 二进制 Chunk

2.1 什么是二进制 Chunk

即Lua预编译形成的虚拟机执行的字节码. 叫做预编译 Chunk (Precomplied Chunk)二进制 Chunk (Binary Chunk).

2.2 luac 命令介绍

2.2.1 编译 Lua 源文件

Lua 编译器以函数为单位进行编译, 每个函数都会被编译为一个内部结构, 称为 原型 (Prototype).
原型包括:

函数原型是一种递归结构, 并且 Lua 源码中函数的嵌套关系会直接反映在编译后的原型里.

在编译过程, 暴露在全局的语句会被包装进 main 函数. 这是自动完成的.

2.2.2 查看二进制 chunk

这是 main 的:

[root@m01 c1]# luac -l hello.out

main <./hello.lua:0,0> (4 instructions, 16 bytes at 0x173c530)
0+ params, 2 slots, 0 upvalues, 0 locals, 2 constants, 0 functions
	1	[1]	GETGLOBAL	0 -1	; print
	2	[1]	LOADK    	1 -2	; "hello"
	3	[1]	CALL     	0 2 1
	4	[1]	RETURN   	0 1

这是 foo-bar 的:

[root@m01 c1]# luac -l foobar.lua 

main <foobar.lua:0,0> (3 instructions, 12 bytes at 0xb74530)
0+ params, 2 slots, 0 upvalues, 0 locals, 1 constant, 1 function
	1	[3]	CLOSURE  	0 0	; 0xb74710
	2	[1]	SETGLOBAL	0 -1	; foo
	3	[3]	RETURN   	0 1

function <foobar.lua:1,3> (3 instructions, 12 bytes at 0xb74710)
0 params, 2 slots, 0 upvalues, 0 locals, 1 constant, 1 function
	1	[2]	CLOSURE  	0 0	; 0xb748c0
	2	[2]	SETGLOBAL	0 -1	; bar
	3	[3]	RETURN   	0 1

function <foobar.lua:2,2> (1 instruction, 4 bytes at 0xb748c0)
0 params, 2 slots, 0 upvalues, 0 locals, 0 constants, 0 functions
	1	[2]	RETURN   	0 1
[root@m01 c1]# 

名称 <所在文件: 开始行号, 结束行号> (指令数量, 函数地址)
固定参数数量 (如果有+, 则代表是 vararg 函数), 运行函数所需寄存器数量, upvalue 数量, 局部变量数量, 常量数量, 子函数数量.
指令序号, 对应行号, 操作码, 操作数, ; 后是注释.

使用 -l -l 可以查看详情, 包括常量列表, 局部变量, upvalue 表:

[root@m01 c1]# luac -l -l hello.lua 

main <hello.lua:0,0> (4 instructions, 16 bytes at 0x696530)
0+ params, 2 slots, 0 upvalues, 0 locals, 2 constants, 0 functions
	1	[1]	GETGLOBAL	0 -1	; print
	2	[1]	LOADK    	1 -2	; "hello"
	3	[1]	CALL     	0 2 1
	4	[1]	RETURN   	0 1
constants (2) for 0x696530:
	1	"print"
	2	"hello"
locals (0) for 0x696530:
upvalues (0) for 0x696530:
[root@m01 c1]# 

2.3 二进制 chunk 格式

Lua 二进制 chunk 一些细节事项:

2.3.1 数据类型

二进制 chunk 内部使用的数据类型大致可分为 数字, 字符串, 列表 三种.

1. 数字

数据类型 C 语言类型 Go 语言类型 占用字节
字节 lu_Byte (unsigned char) byte 1
C 语言整型 int uint32 4
C 语言 size_t 类型 size_t uint64 8
Lua 整型 lua_Integer (long long) int64 8
Lua 浮点型 lua_Number (double) float64 8

2. 字符串

字符串在二进制 chunk 里面是字节数组, 并记录长度.

3. 列表

指令表, 常量表, 子函数原型表等均使用列表存储.

先用 cint 类型记录列表长度, 然后存储 n 个列表元素.

2.3.2 总体结构

用于抽象的 binaryChunk 结构为:

package binchunk

type binaryChunk struct {
    header                  // 头部
    sizeUpvalues byte       // 主函数 upvalue
    mainFunc     *Prototype // 主函数原型
}

2.3.3 头部

二进制 chunk 头部:

type header struct
    signature          [4]byte
    version            byte
    format             byte
    luacData           [6]byte
    cintByte           byte
    sizetSize          byte
    instructionSize    byte
    luaIntegerSize     byte
    luaNumberSize      byte
    luacInt            int64
    luacNum            float64
}

1. 签名

Lua 的二进制起始签名 (魔数, Magic Number) 是 ESC, L, u, a 的 ASCII码. 即 0x1B4C7561, Go 字面量为 "\x1bLua".

[root@m01 c1]# xxd -u -g 1 ./hello.out 
0000000: 1B 4C 75 61 51 00 01 04 08 04 08 00 0D 00 00 00  .LuaQ...........
0000010: 00 00 00 00 40 2E 2F 68 65 6C 6C 6F 2E 6C 75 61  ....@./hello.lua
0000020: 00 00 00 00 00 00 00 00 00 00 00 02 02 04 00 00  ................
0000030: 00 05 00 00 00 41 40 00 00 1C 40 00 01 1E 00 80  .....A@...@.....
0000040: 00 02 00 00 00 04 06 00 00 00 00 00 00 00 70 72  ..............pr
0000050: 69 6E 74 00 04 06 00 00 00 00 00 00 00 68 65 6C  int..........hel
0000060: 6C 6F 00 00 00 00 00 04 00 00 00 01 00 00 00 01  lo..............
0000070: 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 00  ................
0000080: 00 00 00    

2. 版本号

紧跟在二进制签名后, 仅包含大版本号 (Major Version) 和 小版本号(Minor Version). lua 5.3.4 即 5 x 16 + 3 = 83, 即 0x53.

3. 格式号

lua 虚拟机定义的格式号即接下来一个字节, 0.

4. LUAC_DATA

同样用来二进制检查, "\x19\x93\r\n\x1a\n".

5. 整数和 Lua 虚拟机指令宽度

接下来5个字节记录 cint, size_t, lua 虚拟机指令, lua 整数, lua 浮点数 这5种数据类型在二进制 chunk 占用的字节数.

6. LUAC_INT

接下来存放的是整数值 0x5678 用于检查大小端.

7. LUAC_NUM

然后存放浮点数 370.5 用以检测浮点数格式.

2.3.4 函数原型

函数原型结构体为:

type Prototype struct {
    Source              string
    LineDefined         uint32
    LastLineDefined     uint32
    NumParams           byte
    IsVararg            byte
    MaxStackSize        byte
    Code                []uint32
    Constants           []interface{}
    Upvalues            []Upvalue
    Protos              []*Prototype
    LineInfo            []uint32
    LocVars             []LocVar
    UpvalueNames        []string


}

1. 源文件名

只有在主函数中, 此字段才有值. 格式为长度+1和字符串. 字符串 @ 开头代表从 lua 源文件编译而来. =stdin 则代表 从 stdin 编译.

2. 起止行号

记录原型对应起止行号.

3. 固定参数个数

相对于变长参数.

4. 是否是 Vararg 函数

即是否有变长参数.

5. 寄存器数量

也被称作 MaxStackSize.

6. 指令表

每条指令是 4 字节.

7. 常量表

二进制 chunk 常量 tag 值:

tag Lua 字面量类型 存储类型
0x00 nil 不存储
0x01 boolean 字节 (0, 1)
0x03 number Lua 浮点数
0x13 integer Lua 整数
0x04 string 短字符串
0x14 string 长字符串

8. Upvalue 表

每个元素占两个字节.

9. 子函数原型表

10. 行号表

行号表中的行号与指令表中的指令一一对应, 分别记录每条指令在源代码中对应的行号.

11. 局部变量表

每个元素都包含变量名 (按字符串类型存储), 起止指令索引 (按 cint 类型存储).

12. Upvalue 名列表

与 Upvalue 表一一对应, 用于记录 Upvalue 在源代码中的名字.

13. Undump() 函数

写了个校验头部的函数.

2.4 解析二进制 chunk

2.4.1 读取基本数据类型

这里实现了:

func (self *reader) readByte() byte {
    b := self.data[0]
    self.data = self.data[1:]
    return b
}

其余函数都基于类似 readByte() 的模式进行调整:

func (self *reader) read?() byte {
    i := binary.LittleEndian.?()
    self.data = self.data[?:] // 这里进行 maring
    return i
}

readString() 则更复杂些, 需要根据字符串长度进行判断.

2.4.2 检查头部

检查头部标志.

func (r *reader) checkHeader() {
	if string(r.readBytes(4)) != LUA_SIGNATURE {
		panic("not a precompiled chunk")
	}
	if r.readByte() != LUAC_VERSION {
		panic("version mismatch!")
	}
	if r.readByte() != LUAC_FORMAT {
		panic("format mismatch!")
	}
	if string(r.readBytes(6)) != LUAC_DATA {
		panic("corrupted!")
	}
	if r.readByte() != CINT_SIZE {
		panic("int size mismatch!")
	}
	if r.readByte() != CSIZET_SIZE {
		panic("size_t size mismatch!")
	}
	if r.readByte() != INSTRUCTION_SIZE {
		panic("instruction size mismatch!")
	}
	if r.readByte() != LUA_INTEGER_SIZE {
		panic("lua_Integer size mismatch!")
	}
	if r.readByte() != LUA_NUMBER_SIZE {
		panic("lua_Number size mismatch!")
	}
	if r.readLuaInteger() != LUAC_INT {
		panic("endianness mismatch!")
	}
	if r.readLuaNumber() != LUAC_NUM {
		panic("float format mismatch!")
	}
}

2.4.3 读取函数原型

其中 readProto() 用于读取函数原型, 用于填充 Prototype struct. 其中读取 Byte 部分可以直接使用 readByte(), 而其他较长或复杂部分还需要基于之前定义的 read*() 函数进行包装. 包括:

2.5 测试本章代码

本节实现了类似 luac -l -l 的 Lua chunk 解析功能. 具体看源代码即可, 没有复杂内容.

Chapter 3, 指令集

3.1 指令集介绍

基于栈的虚拟机需要使用 PUSH, POP 类指令操作栈. 因此指令集相对较大, 但指令平均长度较短. 基于寄存器的虚拟机可以直接二对寄存器进行寻址, 因此指令集相对较小, 但由于需要把寄存器递指编码进指令, 因此指令平均长度比较长.

Lua 虚拟机采用定长指令集, 每条指令占4个字节 (32b), 6b 用于 Opcode (操作码), 其余 26b 用于 Operand (操作数).

3.2 指令编码格式

讨论编码模式, 操作数, 操作码.

3.2.1 编码模式

Lua 虚拟机指令可以分为4类, 分别对应4种编码模式:

在 Lua 虚拟机总计 47 条指令中, 39 条使用 iABC 模式. 剩余 iABx (3), iAsBx (4), iAx (1).

3.2.2 操作码

总计 47 个.

3.2.3 操作数

操作数 A 主要用于表示目标寄存器索引. 其他操作数则可分为 4 种类型:

3.2.4 指令表

opcode 结构体:

type opcode struct {
	testFlag byte // operator is a test (next instruction must be a jump)
	setAFlag byte // instruction set register A
	argBMode byte // B arg mode
	argCMode byte // C arg mode
	opMode   byte // op mode
	name     string
	action   func(i Instruction, vm api.LuaVM)
}

然后源文件 lua.go/vm/opcodes.go: var opcodes 定义了所有 opcode. 格式例如:

//     T  A  B       C       mode  name      
opcode{0, 1, OpArgR, OpArgN, IABC, "MOVE    "}

对应上述结构体.

3.3 指令解码

这里使用了按照操作数或操作码长度进行按位与操作, 以便从 32bit 长度中提取出所需长度的值.

其中 iAsBx 提取会复杂一些:

func (instr Instruction) ABx() (a, bx int) {
	a = int(instr >> 6 & 0xFF)
	bx = int(instr >> 14)
	return
}

func (instr Instruction) AsBx() (a, sbx int) {
	a, bx := instr.ABx()
	return a, bx - MAXARG_sBx
}

用了 MAXARG_sBx 作为 padding 进行位操作.

3.4 测试本章代码

Chapter 4, Lua API

4.1 Lua API 介绍

lua_State 封装了 lua 解释器.

4.1 Lua 栈

lua state 是宿主语言和 lua 沟通得桥梁, Lua API 函数其中有很大ui部分是专门用来操作 Lua 栈的.

本节介绍 lua state 栈里能够存放哪些值, 然后介绍如何按照索引来存取这些值, 最后编码实现.

4.2.1 Lua 数据类型和值

Lua 是动态类型语言, 在 Lua 代码里, 变量是不携带类型信息的, 变量的值才携带类型信息.

Lua 内置了 8 种数据类型:

print(type(nil))                        -- nil
print(type(true))                       -- boolean
print(type(3.14))                       -- number
print(type("hello"))                    -- string
print(type({}))                         -- table
print(type(print))                      -- function
print(type(coroutine.create(print)))    -- thread
print(type(io.stdin))                   -- userdata

字符串是简单的字节数组. 前面几种基本类型也可以很方便映射, 映射表如下:

Lua 类型 Go 类型
nil nil
boolean bool
integer int64
float float64
string string

首先定义类型常量:

/* basic types */
const (
	LUA_TNONE = iota - 1 // -1
	LUA_TNIL
	LUA_TBOOLEAN
	LUA_TLIGHTUSERDATA
	LUA_TNUMBER
	LUA_TSTRING
	LUA_TTABLE
	LUA_TFUNCTION
	LUA_TUSERDATA
	LUA_TTHREAD
)

其中多出来的 LUA_TNONE 用于针对栈索引的无效值.

然后是对应类型的值类型 luaValue :

type luaValue interface{}

实现 typeOf(), 这里定义了 luaValue 里面有个 type 属性:

func typeOf(val luaValue) LuaType {
	switch val.(type) {
	case nil:
		return LUA_TNIL
	case bool:
		return LUA_TBOOLEAN
	case int64, float64:
		return LUA_TNUMBER
	case string:
		return LUA_TSTRING
	case *luaTable:
		return LUA_TTABLE
	case *closure:
		return LUA_TFUNCTION
	case *luaState:
		return LUA_TTHREAD
	case *userdata:
		return LUA_TUSERDATA
	default:
		panic("unknown type!")
	}
}

4.2.2 栈索引

这里描述了 Lua 栈的设计:

4.2.3 定义 luaStack 结构体

luaStack 实现:

type luaStack struct {
	/* virtual stack */
	slots []luaValue
	top   int
	/* call info */
	state   *luaState
	closure *closure
	varargs []luaValue
	openuvs map[int]*upvalue
	pc      int
	/* linked list */
	prev *luaStack
}

slots 存储值, 即 luaValue. top 记录栈顶索引.

newLuaStack() 用于创建指定容量的栈, 实现:

func newLuaStack(size int, state *luaState) *luaStack {
	return &luaStack{
		state: state,
		slots: make([]luaValue, size),
	}
}

check() 方法检查栈的空闲空间是否还可以容纳 n 个值. 如果不满足则进行扩容. (这里总感觉是个坏味道? check 还能扩容?)

func (stack *luaStack) check(n int) {
	free := len(stack.slots) - stack.top
	for i := free; i < n; i++ {
		stack.slots = append(stack.slots, nil)
	}
}

push() 将值压入栈顶, 如果溢出则 painc().

func (stack *luaStack) push(val luaValue) {
	if stack.top == len(stack.slots) {
		panic("stack overflow!")
	}
	stack.slots[stack.top] = val
	stack.top++
}

pop() 则弹出, 如果为空则 painc().

func (stack *luaStack) pop() luaValue {
	if stack.top < 1 {
		panic("stack underflow!")
	}
	stack.top--
	val := stack.slots[stack.top]
	stack.slots[stack.top] = nil
	return val
}

absIndex() 将索引转换为绝对索引.

/*
@@ LUAI_MAXSTACK limits the size of the Lua stack.
** CHANGE it if you need a different limit. This limit is arbitrary;
** its only purpose is to stop Lua from consuming unlimited stack
** space (and to reserve some numbers for pseudo-indices).
*/
const LUAI_MAXSTACK = 1000000
/*
** Pseudo-indices
** (-LUAI_MAXSTACK is the minimum valid index; we keep some free empty
** space after that to help overflow detection)
 */
const LUA_REGISTRYINDEX = -LUAI_MAXSTACK - 1000

即定义了一个最大达到 -LUAI_MAXSTACK - 1000 的区间用于操作 upvalue .

func (stack *luaStack) absIndex(idx int) int {
	// zero or positive or pseudo
	if idx >= 0 || idx <= LUA_REGISTRYINDEX {
		return idx
	}
	// negative
	return idx + stack.top + 1
}

isValid() 判断索引是否有效:

func (stack *luaStack) isValid(idx int) bool {
	if idx < LUA_REGISTRYINDEX { /* upvalues */
		uvIdx := LUA_REGISTRYINDEX - idx - 1
		c := stack.closure
		return c != nil && uvIdx < len(c.upvals)
	}
	if idx == LUA_REGISTRYINDEX {
		return true
	}
	absIdx := stack.absIndex(idx)
	return absIdx > 0 && absIdx <= stack.top
}

get() 依据索引从栈取值.

func (stack *luaStack) get(idx int) luaValue {
	if idx < LUA_REGISTRYINDEX { /* upvalues */
		uvIdx := LUA_REGISTRYINDEX - idx - 1
		c := stack.closure
		if c == nil || uvIdx >= len(c.upvals) {
			return nil
		}
		return *(c.upvals[uvIdx].val)
	}

	if idx == LUA_REGISTRYINDEX {
		return stack.state.registry
	}

	absIdx := stack.absIndex(idx)
	if absIdx > 0 && absIdx <= stack.top {
		return stack.slots[absIdx-1]
	}
	return nil
}

set() 写入值.

func (stack *luaStack) set(idx int, val luaValue) {
	if idx < LUA_REGISTRYINDEX { /* upvalues */
		uvIdx := LUA_REGISTRYINDEX - idx - 1
		c := stack.closure
		if c != nil && uvIdx < len(c.upvals) {
			*(c.upvals[uvIdx].val) = val
		}
		return
	}

	if idx == LUA_REGISTRYINDEX {
		stack.state.registry = val.(*luaTable)
		return
	}

	absIdx := stack.absIndex(idx)
	if absIdx > 0 && absIdx <= stack.top {
		stack.slots[absIdx-1] = val
		return
	}
	panic("invalid index!")
}

4.3 Lua State

Lua State 封装了整个 Lua 解释器状态.

4.3.1 定义 LuaState 接口

type LuaState interface {
	BasicAPI
	DebugAPI
	AuxLib
	String() string // debug
}

type BasicAPI interface {
	/* state manipulation */
	Close()                               //
	AtPanic(panicf GoFunction) GoFunction // set panic function
	Version() float64                     // get version number
	/* basic stack manipulation */
	GetTop() int              // stack.top
	AbsIndex(idx int) int     // abs(idx)
	UpvalueIndex(idx int) int //
	CheckStack(n int) bool    //
	Pop(n int)                // pop(n)
	Copy(fromIdx, toIdx int)  // r[toIdx] = r[fromidx]
	PushValue(idx int)        // push(r[idx])
	Replace(idx int)          // r[idx] = pop()
	Insert(idx int)           // r[idx, -1] >> 1
	Remove(idx int)           // remove(r[idx])
	Rotate(idx, n int)        // r[idx, -1] >> n
	SetTop(idx int)           // stack.top = idx
	XMove(to LuaState, n int) // to.push(pop(n))
	/* access functions (stack -> C) */
	TypeName(tp LuaType) string        // r[idx].type.name
	Type(idx int) LuaType              // r[idx].type
	IsNone(idx int) bool               // r[idx].type == LUA_TNONE
	IsNil(idx int) bool                // r[idx].type == LUA_TNIL
	IsNoneOrNil(idx int) bool          // r[idx].type == LUA_TNONE || LUA_TNIL
	IsBoolean(idx int) bool            // r[idx].type == LUA_TBOOLEAN
	IsInteger(idx int) bool            // r[idx].type == LUA_TINTEGER
	IsNumber(idx int) bool             // r[idx] ~= LuaNumber
	IsString(idx int) bool             // r[idx] ~= LuaString
	IsTable(idx int) bool              // r[idx].type == LUA_TTABLE
	IsThread(idx int) bool             // r[idx].type == LUA_TTHREAD
	IsFunction(idx int) bool           // r[idx].type == LUA_TFUNCTION
	IsGoFunction(idx int) bool         // r[idx].type == LUA_TLCL || LUA_TGCL
	IsUserData(idx int) bool           // r[idx].type == LUA_TUSERDATA
	ToBoolean(idx int) bool            // r[idx] as bool
	ToInteger(idx int) int64           // r[idx] as LuaInteger
	ToIntegerX(idx int) (int64, bool)  // r[idx] as LuaInteger
	ToNumber(idx int) float64          // r[idx] as LuaNumber
	ToNumberX(idx int) (float64, bool) // r[idx] as LuaNumber
	ToString(idx int) string           // r[idx] as string
	ToStringX(idx int) (string, bool)  // r[idx] as string
	ToGoFunction(idx int) GoFunction   // r[idx] as GoFunction
	ToThread(idx int) LuaState         // r[idx] as LuaThread
	ToUserData(idx int) UserData       // r[idx] as UserData
	ToPointer(idx int) interface{}     // r[idx] as interface{}
	RawLen(idx int) uint               // len(r[idx])
	/* push functions (C -> stack) */
	PushNil()                                        // push(nil)
	PushBoolean(b bool)                              // push(b)
	PushInteger(n int64)                             // push(n)
	PushNumber(n float64)                            // push(n)
	PushString(s string)                             // push(s)
	PushFString(fmt string, a ...interface{}) string // push(fmt*a)
	PushGoFunction(f GoFunction)                     // push(f)
	PushGoClosure(f GoFunction, n int)               // push(f)
	PushLightUserData(d UserData)                    // push(d)
	PushGlobalTable()                                // push(global)
	PushThread() bool                                // push(thread)
	/* Comparison and arithmetic functions */
	Arith(op ArithOp)                          // b=pop(); a=pop(); push(a op b)
	Compare(idx1, idx2 int, op CompareOp) bool // r[idx1] op r[idx2]
	RawEqual(idx1, idx2 int) bool              // r[idx1] == r[idx2]
	/* get functions (Lua -> stack) */
	NewTable()                           // push({})
	CreateTable(nArr, nRec int)          // push({})
	GetTable(idx int) LuaType            // push(r[idx][pop()])
	GetField(idx int, k string) LuaType  // push(r[idx][k])
	GetI(idx int, i int64) LuaType       // push(r[idx][i])
	RawGet(idx int) LuaType              // push(r[idx][pop()])
	RawGetI(idx int, i int64) LuaType    // push(r[idx][i])
	RawGetP(idx int, p UserData) LuaType // push(r[idx][p])
	GetGlobal(name string) LuaType       // push(global[name])
	GetMetatable(idx int) bool           // push(r[idx].metatable)?
	GetUserValue(idx int) LuaType        // push(r[idx].userValue)
	/* set functions (stack -> Lua) */
	SetTable(idx int)                   // v=pop(); k=pop(); r[idx][k] = v
	SetField(idx int, k string)         // r[idx][k] = pop()
	SetI(idx int, i int64)              // r[idx][i] = pop()
	RawSet(idx int)                     // v=pop(); k=pop(); r[idx][k] = v
	RawSetI(idx int, i int64)           // r[idx][i] = pop()
	RawSetP(idx int, p UserData)        // r[idx][p] = pop()
	Register(name string, f GoFunction) // global[name] = f
	SetGlobal(name string)              // global[name] = pop()
	SetMetatable(idx int)               // r[idx].metatable = pop()
	SetUserValue(idx int)               // r[idx].userValue = pop()
	/* 'load' and 'call' functions (load and run Lua code) */
	Dump(strip bool) []byte                                 // todo
	Load(chunk []byte, chunkName, mode string) ThreadStatus // push(compile(chunk))
	Call(nArgs, nResults int)                               // args=pop(nArgs); f=pop(); f(args)
	CallK()                                                 //
	PCall(nArgs, nResults, msgh int) ThreadStatus           // call(nArgs, nResults) || push(err)
	PCallK()                                                //
	/* miscellaneous functions */
	Concat(n int)                 // push(concat(pop(n)))
	Len(idx int)                  // push(len(r[idx]))
	Next(idx int) bool            // key=pop(); k,v=next(r[idx]); push(k,v);
	StringToNumber(s string) bool // push(number(s))
	Error() int                   // panic(r[-1])
	/* coroutine functions */
	NewThread() LuaState                          // todo
	Resume(from LuaState, nArgs int) ThreadStatus // todo
	Yield(nResults int) int                       // todo
	YieldK()                                      // todo
	Status() ThreadStatus                         // todo
	IsYieldable() bool                            // todo
	/* garbage-collection function and options */
	GC(what, data int) int //
}

4.3.2 定义 luaState 结构体

实现 luaState

type luaState struct {
	/* global state */
	hook     LuaHook
	hookMask int
	panicf   GoFunction
	registry *luaTable
	/* stack */
	stack     *luaStack
	callDepth int
	/* coroutine */
	coStatus ThreadStatus
	coCaller *luaState
	coChan   chan int
}

New() 创建 luaState().

func New() LuaState {
	ls := &luaState{}

	registry := newLuaTable(8, 0)
	registry.put(LUA_RIDX_MAINTHREAD, ls)
	registry.put(LUA_RIDX_GLOBALS, newLuaTable(0, 20))

	ls.registry = registry
	ls.pushLuaStack(newLuaStack(LUA_MINSTACK, ls))
	return ls
}

4.3.3 基础栈操作方法

GetTop() 反获栈顶索引.

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_gettop
// lua-5.3.4/src/lapi.c#lua_gettop()
func (state *luaState) GetTop() int {
	return state.stack.top
}

AbsIndex() 把索引转换为绝对索引.

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_absindex
// lua-5.3.4/src/lapi.c#lua_absindex()
func (state *luaState) AbsIndex(idx int) int {
	return state.stack.absIndex(idx)
}

CheckStack() 进行检查如果不足则扩容.

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_checkstack
// lua-5.3.4/src/lapi.c#lua_checkstack()
func (state *luaState) CheckStack(n int) bool {
	state.stack.check(n)
	return true // never fails
}

Pop() 从栈顶弹出 n 个值.

// [-n, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_pop
// lua-5.3.4/src/lua.h#lua_pop()
func (state *luaState) Pop(n int) {
	for i := 0; i < n; i++ {
		state.stack.pop()
	}
}

Copy() 把值从一个位置复制到另一个位置 (栈内复制).

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_copy
// lua-5.3.4/src/lapi.c#lua_copy()
func (state *luaState) Copy(fromIdx, toIdx int) {
	val := state.stack.get(fromIdx)
	state.stack.set(toIdx, val)
}

PushValue() 把指定索引处的值推入栈顶.

// [-0, +1, –]
// http://www.lua.org/manual/5.3/manual.html#lua_pushvalue
// lua-5.3.4/src/lapi.c#lua_pushvalue()
func (state *luaState) PushValue(idx int) {
	val := state.stack.get(idx)
	state.stack.push(val)
}

Replace() 是 PushValue() 的反操作, 将栈顶值弹出并写入指定位置.

// [-1, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_replace
// lua-5.3.4/src/lua.h#lua_replace()
func (state *luaState) Replace(idx int) {
	val := state.stack.pop()
	state.stack.set(idx, val)
}

Insert() 将栈顶值弹出, 并插入指定位置. 是 Rotate() 的一个特例.

// [-1, +1, –]
// http://www.lua.org/manual/5.3/manual.html#lua_insert
// lua-5.3.4/src/lua.h#lua_insert()
func (state *luaState) Insert(idx int) {
	state.Rotate(idx, 1)
}

Remove() 删除指定索引处的值. 可以用 Rotate() 组合 Pop() 实现.

// [-1, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_remove
// lua-5.3.4/src/lua.h#lua_remove()
func (state *luaState) Remove(idx int) {
	state.Rotate(idx, -1)
	state.Pop(1)
}

Rotate() 方法将 [idx, top] 索引区间内的值朝栈顶方向旋转 n 个位置, 如果 n 是负数, 那么朝栈底方向旋转. 可以理解为履带式滚动位置.

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_rotate
// lua-5.3.4/src/lapi.c#lua_rotate()
func (state *luaState) Rotate(idx, n int) {
	t := state.stack.top - 1           /* end of stack segment being rotated */
	p := state.stack.absIndex(idx) - 1 /* start of segment */
	var m int                          /* end of prefix */
	if n >= 0 {
		m = t - n
	} else {
		m = p - n - 1
	}
	state.stack.reverse(p, m)   /* reverse the prefix with length 'n' */
	state.stack.reverse(m+1, t) /* reverse the suffix */
	state.stack.reverse(p, t)   /* reverse the entire segment */
}

Reverse() 实现:

func (stack *luaStack) reverse(from, to int) {
	slots := stack.slots
	for from < to {
		slots[from], slots[to] = slots[to], slots[from]
		from++
		to--
	}
}

SetTop() 将栈顶索引设置为指定值. 如果指定值小于当前栈顶索引, 效果则相当于弹出 (指定为 0 则是清空). 如果大于当前栈顶索引, 则相当于插入 nil.

// [-?, +?, –]
// http://www.lua.org/manual/5.3/manual.html#lua_settop
// lua-5.3.4/src/lapi.c#lua_settop()
func (state *luaState) SetTop(idx int) {
	newTop := state.stack.absIndex(idx)
	if newTop < 0 {
		panic("stack underflow!")
	}

	n := state.stack.top - newTop
	if n > 0 {
		for i := 0; i < n; i++ {
			state.stack.pop()
		}
	} else if n < 0 {
		for i := 0; i > n; i-- {
			state.stack.push(nil)
		}
	}
}

4.3.4 Push 方法

这里的 Push() 方法将外部值推入栈顶.

// [-0, +1, –]
// http://www.lua.org/manual/5.3/manual.html#lua_pushnil
func (state *luaState) PushNil() {
	state.stack.push(nil)
}

// [-0, +1, –]
// http://www.lua.org/manual/5.3/manual.html#lua_pushboolean
func (state *luaState) PushBoolean(b bool) {
	state.stack.push(b)
}

// [-0, +1, –]
// http://www.lua.org/manual/5.3/manual.html#lua_pushinteger
func (state *luaState) PushInteger(n int64) {
	state.stack.push(n)
}

// [-0, +1, –]
// http://www.lua.org/manual/5.3/manual.html#lua_pushnumber
func (state *luaState) PushNumber(n float64) {
	state.stack.push(n)
}

// [-0, +1, m]
// http://www.lua.org/manual/5.3/manual.html#lua_pushstring
func (state *luaState) PushString(s string) {
	state.stack.push(s)
}

// [-0, +1, e]
// http://www.lua.org/manual/5.3/manual.html#lua_pushfstring
func (state *luaState) PushFString(fmtStr string, a ...interface{}) string {
	str := fmt.Sprintf(fmtStr, a...)
	state.stack.push(str)
	return str
}

4.3.5 Access 方法

Access 系列方法用于从栈里获取信息.

TypeName() 获取类型的字符串名称.

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_typename
// lua-5.3.4/src/lapi.c#lua_typename()
func (state *luaState) TypeName(tp LuaType) string {
	switch tp {
	case LUA_TNONE:
		return "no value"
	case LUA_TNIL:
		return "nil"
	case LUA_TBOOLEAN:
		return "boolean"
	case LUA_TNUMBER:
		return "number"
	case LUA_TSTRING:
		return "string"
	case LUA_TTABLE:
		return "table"
	case LUA_TFUNCTION:
		return "function"
	case LUA_TUSERDATA:
		return "userdata"
	case LUA_TTHREAD:
		return "thread"
	default:
		panic("unreachable!")
	}
}

Type() 根据索引反获值的类型.

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_type
// lua-5.3.4/src/lapi.c#lua_type()
func (state *luaState) Type(idx int) LuaType {
	if state.stack.isValid(idx) {
		val := state.stack.get(idx)
		return typeOf(val)
	} else {
		return LUA_TNONE
	}
}

IsType, IsNone, IsNil, IsNoneOrNil, IsBoolean.

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_isnone
// lua-5.3.4/src/lua.h#lua_isnone()
func (state *luaState) IsNone(idx int) bool {
	return state.Type(idx) == LUA_TNONE
}

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_isnil
// lua-5.3.4/src/lua.h#lua_isnil()
func (state *luaState) IsNil(idx int) bool {
	return state.Type(idx) == LUA_TNIL
}

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_isnoneornil
// lua-5.3.4/src/lua.h#lua_isnoneornil()
func (state *luaState) IsNoneOrNil(idx int) bool {
	return state.Type(idx) <= 0
}

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_isboolean
// lua-5.3.4/src/lua.h#lua_isboolean()
func (state *luaState) IsBoolean(idx int) bool {
	return state.Type(idx) == LUA_TBOOLEAN
}

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_isstring
// lua-5.3.4/src/lapi.c#lua_isstring()
func (state *luaState) IsString(idx int) bool {
	t := state.Type(idx)
	return t == LUA_TSTRING || t == LUA_TNUMBER
}

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_isnumber
// lua-5.3.4/src/lapi.c#lua_isnumber()
func (state *luaState) IsNumber(idx int) bool {
	_, ok := state.ToNumberX(idx)
	return ok
}

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_isinteger
// lua-5.3.4/src/lapi.c#lua_isinteger()
func (state *luaState) IsInteger(idx int) bool {
	val := state.stack.get(idx)
	_, ok := val.(int64)
	return ok
}

ToBoolean() 从指定索引处取出一个布尔值, 如果不是布尔类型, 则进行类型转换.

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_toboolean
func (state *luaState) ToBoolean(idx int) bool {
	val := state.stack.get(idx)
	return convertToBoolean(val)
}

convertToBoolean():

func convertToBoolean(val luaValue) bool {
	switch x := val.(type) {
	case nil:
		return false
	case bool:
		return x
	default:
		return true
	}
}

ToNumber 和 ToNumberX, 从指定索引处取出一个数字. 如果不是数字类型则进行类型转换. 转换失败时, ToNumber 返回 0, ToNumberX 返回是否成功.

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_tonumber
func (state *luaState) ToNumber(idx int) float64 {
	n, _ := state.ToNumberX(idx)
	return n
}

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_tonumberx
func (state *luaState) ToNumberX(idx int) (float64, bool) {
	val := state.stack.get(idx)
	return convertToFloat(val)
}

ToInteger, ToIntegerX

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_tointeger
func (state *luaState) ToInteger(idx int) int64 {
	i, _ := state.ToIntegerX(idx)
	return i
}

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_tointegerx
func (state *luaState) ToIntegerX(idx int) (int64, bool) {
	val := state.stack.get(idx)
	return convertToInteger(val)
}

ToString, ToStringX 同样, 不过该函数会修改栈. 将非 string 转换为 string.

// [-0, +0, m]
// http://www.lua.org/manual/5.3/manual.html#lua_tostring
// http://www.lua.org/manual/5.3/manual.html#lua_tolstring
// lua-5.3.4/src/lua.h#lua_tostring()
// lua-5.3.4/src/lapi.c#lua_tolstring()
func (state *luaState) ToString(idx int) string {
	s, _ := state.ToStringX(idx)
	return s
}

func (state *luaState) ToStringX(idx int) (string, bool) {
	val := state.stack.get(idx)

	switch x := val.(type) {
	case string:
		return x, true
	case int64, float64:
		s := fmt.Sprintf("%v", x) // todo
		state.stack.set(idx, s)   // 这里会修改栈
		return s, true
	default:
		return "", false
	}
}

4.4 测试本章代码

4.5 本章小结

Chapter 5, Lua 运算符

可分为算术 (Arithmetic) 运算符, 按位 (Bitwise) 运算符, 比较 (Comparison) 运算符, 逻辑 (Logical) 运算符, 长度运算符, 字符串拼接运算符.

5.1 Lua 运算符介绍

1. 算数运算符

需要考虑的问题有:

具体实现:

package number

import "math"

func FloatToInteger(f float64) (int64, bool) {
	// todo: correct?
	i := int64(f)
	return i, float64(i) == f
}

// a % b == a - ((a // b) * b)
func IMod(a, b int64) int64 {
	return a - IFloorDiv(a, b)*b
}

// a % b == a - ((a // b) * b)
// lua-5.3.4/src/llimits.h#luai_nummod
func FMod(a, b float64) float64 {
	if a > 0 && math.IsInf(b, 1) || a < 0 && math.IsInf(b, -1) {
		return a
	}
	if a > 0 && math.IsInf(b, -1) || a < 0 && math.IsInf(b, 1) {
		return b
	}
	return a - math.Floor(a/b)*b
}

func IFloorDiv(a, b int64) int64 {
	if a > 0 && b > 0 || a < 0 && b < 0 || a%b == 0 {
		return a / b
	} else {
		return a/b - 1
	}
}

func FFloorDiv(a, b float64) float64 {
	return math.Floor(a / b)
}

func ShiftLeft(a, n int64) int64 {
	if n >= 0 {
		return a << n
	} else {
		return ShiftRight(a, -n)
	}
}

func ShiftRight(a, n int64) int64 {
	if n >= 0 {
		return int64(uint64(a) >> n)
	} else {
		return ShiftLeft(a, -n)
	}
}

2. 按位运算符

同上, 其中左移和右移需要特殊处理.

3. 比较运算符

实现 ==, <, <= 后, 就可以做等价运算了.

4. 逻辑运算符

Lua 对逻辑与和逻辑或表达式进行短路求值.

5. 长度运算符

#.

6. 字符串拼接运算符

...

5.2 类型自动转换

1. 转换规则

2. 浮点数转换为整数

直接转换.

3. 字符串解析为数字

func ParseInteger(str string) (int64, bool) {
	str = strings.TrimSpace(str)
	str = strings.ToLower(str)

	if !reInteger.MatchString(str) { // float?
		return 0, false
	}
	if str[0] == '+' {
		str = str[1:]
	}
	if strings.Index(str, "0x") < 0 { // decimal
		i, err := strconv.ParseInt(str, 10, 64)
		return i, err == nil
	}

	// hex
	var sign int64 = 1
	if str[0] == '-' {
		sign = -1
		str = str[3:]
	} else {
		str = str[2:]
	}

	if len(str) > 16 {
		str = str[len(str)-16:] // cut long hex string
	}

	i, err := strconv.ParseUint(str, 16, 64)
	return sign * int64(i), err == nil
}

func ParseFloat(str string) (float64, bool) {
	str = strings.TrimSpace(str)
	str = strings.ToLower(str)
	if strings.Contains(str, "nan") || strings.Contains(str, "inf") {
		return 0, false
	}
	if strings.HasPrefix(str, "0x") && len(str) > 2 {
		return parseHexFloat(str[2:])
	}
	if strings.HasPrefix(str, "+0x") && len(str) > 3 {
		return parseHexFloat(str[3:])
	}
	if strings.HasPrefix(str, "-0x") && len(str) > 3 {
		f, ok := parseHexFloat(str[3:])
		return -f, ok
	}
	f, err := strconv.ParseFloat(str, 64)
	return f, err == nil
}

目前实现还没这么复杂, 直接用的 Go 内置的转换.

4. 任意值转换为浮点数

5. 任意值转换为整数

这两个一样, 都是先判断类型, 数字类型可以进行转换, 字符串尝试转换, 其他类型则是0.

5.3 扩展 LuaState 接口

type LuaState interface {
	BasicAPI
	DebugAPI
	AuxLib
	String() string // debug
}

type BasicAPI interface {
	Arith(op ArithOp)                          // b=pop(); a=pop(); push(a op b)
    Compare(idx1, idx2 int, op CompareOp) bool // r[idx1] op r[idx2]
    Concat(n int)                 // push(concat(pop(n)))
	Len(idx int)                  // push(len(r[idx]))
}

新增实现这几个 LuaState 接口.

定义运算符:

/* arithmetic functions */
const (
	LUA_OPADD  = iota // +
	LUA_OPSUB         // -
	LUA_OPMUL         // *
	LUA_OPMOD         // %
	LUA_OPPOW         // ^
	LUA_OPDIV         // /
	LUA_OPIDIV        // //
	LUA_OPBAND        // &
	LUA_OPBOR         // |
	LUA_OPBXOR        // ~
	LUA_OPSHL         // <<
	LUA_OPSHR         // >>
	LUA_OPUNM         // -
	LUA_OPBNOT        // ~
)

5.3.1 Arith 方法

var (
	iadd = func(a, b int64) int64 { return a + b }
	fadd = func(a, b float64) float64 { return a + b }
	isub = func(a, b int64) int64 { return a - b }
	fsub = func(a, b float64) float64 { return a - b }
	imul = func(a, b int64) int64 { return a * b }
	fmul = func(a, b float64) float64 { return a * b }
	div  = func(a, b float64) float64 { return a / b }
	iunm = func(a, _ int64) int64 { return -a }
	funm = func(a, _ float64) float64 { return -a }
	band = func(a, b int64) int64 { return a & b }
	bor  = func(a, b int64) int64 { return a | b }
	bxor = func(a, b int64) int64 { return a ^ b }
	bnot = func(a, _ int64) int64 { return ^a }
)

容纳整数和浮点数类型的运算的结构体为:

type operator struct {
	metamethod  string
	integerFunc func(int64, int64) int64
	floatFunc   func(float64, float64) float64
}

操作符 slice, 用于包装操作不同数据类型的同一操作符, 即上面的结构体的实例化:

var operators = []operator{
	operator{"__add", iadd, fadd},
	operator{"__sub", isub, fsub},
	operator{"__mul", imul, fmul},
	operator{"__mod", number.IMod, number.FMod},
	operator{"__pow", nil, math.Pow},
	operator{"__div", nil, div},
	operator{"__idiv", number.IFloorDiv, number.FFloorDiv},
	operator{"__band", band, nil},
	operator{"__bor", bor, nil},
	operator{"__bxor", bxor, nil},
	operator{"__shl", number.ShiftLeft, nil},
	operator{"__shr", number.ShiftRight, nil},
	operator{"__unm", iunm, funm},
	operator{"__bnot", bnot, nil},
}

实现 Arith():

// [-(2|1), +1, e]
// http://www.lua.org/manual/5.3/manual.html#lua_arith
func (state *luaState) Arith(op ArithOp) {
    // 从栈取出 oprands
	var a, b luaValue // operands
	b = state.stack.pop()
	if op != LUA_OPUNM && op != LUA_OPBNOT {
		a = state.stack.pop()
	} else {
		a = b
	}

	operator := operators[op]
	if result := _arith(a, b, operator); result != nil {
		state.stack.push(result)
		return
	}

	mm := operator.metamethod
	if result, ok := callMetamethod(a, b, mm, state); ok {
		state.stack.push(result)
		return
	}

	if operator.floatFunc == nil {
		panic("number has no integer representation")
	}

	var typeName string
	if _, ok := convertToFloat(a); !ok {
		typeName = state.TypeName(typeOf(a))
	} else {
		typeName = state.TypeName(typeOf(b))
	}
	panic("attempt to perform arithmetic on a " + typeName + " value")
}

逻辑是取出操作数, 然后根据 op 选择 Arith op 然后执行, 最后压入栈.

_arith() 则是具体运算逻辑:

func _arith(a, b luaValue, op operator) luaValue {
	if op.floatFunc == nil { // bitwise
		if x, ok := convertToInteger(a); ok {
			if y, ok := convertToInteger(b); ok {
				return op.integerFunc(x, y)
			}
		}
	} else { // arith
		if op.integerFunc != nil { // add,sub,mul,mod,idiv,unm
			if x, ok := a.(int64); ok {
				if y, ok := b.(int64); ok {
					return op.integerFunc(x, y)
				}
			}
		}
		if x, ok := convertToFloat(a); ok {
			if y, ok := convertToFloat(b); ok {
				return op.floatFunc(x, y)
			}
		}
	}
	return nil
}

即按具体操作数决定是否类型转换后进行操作.

5.3.2 Compare() 方法

Compare() 方法比较栈上两个值, 但不对栈做出修改.

// [-0, +0, e]
// http://www.lua.org/manual/5.3/manual.html#lua_compare
func (state *luaState) Compare(idx1, idx2 int, op CompareOp) bool {
	if !state.stack.isValid(idx1) || !state.stack.isValid(idx2) {
		return false
	}

	a := state.stack.get(idx1)
	b := state.stack.get(idx2)
	switch op {
	case LUA_OPEQ:
		return state.eq(a, b, false)
	case LUA_OPLT:
		return state.lt(a, b)
	case LUA_OPLE:
		return state.le(a, b)
	default:
		panic("invalid compare op!")
	}
}

同样, 具体执行 compare 的 op 需要依据操作码进行选择方法操作:

func (state *luaState) eq(a, b luaValue, raw bool) bool {
	switch x := a.(type) {
	case nil:
		return b == nil
	case bool:
		y, ok := b.(bool)
		return ok && x == y
	case string:
		y, ok := b.(string)
		return ok && x == y
	case int64:
		switch y := b.(type) {
		case int64:
			return x == y
		case float64:
			return float64(x) == y
		default:
			return false
		}
	case float64:
		switch y := b.(type) {
		case float64:
			return x == y
		case int64:
			return x == float64(y)
		default:
			return false
		}
	case *luaTable:
		if y, ok := b.(*luaTable); ok && x != y && !raw {
			if result, ok := callMetamethod(x, y, "__eq", state); ok {
				return convertToBoolean(result)
			}
		}
		return a == b
	case *userdata:
		if y, ok := b.(*userdata); ok {
			return x.data == y.data
		}
		return false
	default:
		return a == b
	}
}

func (state *luaState) lt(a, b luaValue) bool {
	switch x := a.(type) {
	case string:
		if y, ok := b.(string); ok {
			return x < y
		}
	case int64:
		switch y := b.(type) {
		case int64:
			return x < y
		case float64:
			return float64(x) < y
		}
	case float64:
		switch y := b.(type) {
		case float64:
			return x < y
		case int64:
			return x < float64(y)
		}
	}

	if result, ok := callMetamethod(a, b, "__lt", state); ok {
		return convertToBoolean(result)
	}
	typeName1 := state.TypeName(typeOf(a))
	typeName2 := state.TypeName(typeOf(b))
	panic("attempt to compare " + typeName1 + " with " + typeName2)
}

func (state *luaState) le(a, b luaValue) bool {
	switch x := a.(type) {
	case string:
		if y, ok := b.(string); ok {
			return x <= y
		}
	case int64:
		switch y := b.(type) {
		case int64:
			return x <= y
		case float64:
			return float64(x) <= y
		}
	case float64:
		switch y := b.(type) {
		case float64:
			return x <= y
		case int64:
			return x <= float64(y)
		}
	}

	if result, ok := callMetamethod(a, b, "__le", state); ok {
		return convertToBoolean(result)
	}
	if result, ok := callMetamethod(b, a, "__lt", state); ok {
		return !convertToBoolean(result)
	}
	typeName1 := state.TypeName(typeOf(a))
	typeName2 := state.TypeName(typeOf(b))
	panic("attempt to compare " + typeName1 + " with " + typeName2)
}

这路i的对比形式是, 现检测 a 的值类型, 然后将 b 转换到 a 一致的类型, 然后进行比较.

大小比较则只有字符串和数字才有意义.

5.3.3 Len() 方法

访问指定位置值, 求长度后将结果压入栈顶.

// [-0, +1, e]
// http://www.lua.org/manual/5.3/manual.html#lua_len
func (state *luaState) Len(idx int) {
	val := state.stack.get(idx)
	if s, ok := val.(string); ok {
		state.stack.push(int64(len(s)))
	} else if result, ok := callMetamethod(val, val, "__len", state); ok {
		state.stack.push(result)
	} else if t, ok := val.(*luaTable); ok {
		state.stack.push(int64(t.len()))
	} else {
		typeName := state.TypeName(typeOf(val))
		panic("attempt to get length of a " + typeName + " value")
	}
}

5.3.4 Concat() 方法

弹出 n 个值, 然后拼接后压入栈顶.

// [-n, +1, e]
// http://www.lua.org/manual/5.3/manual.html#lua_concat
func (state *luaState) Concat(n int) {
	if n == 0 {
		state.stack.push("")
	} else if n >= 2 {
		for i := 1; i < n; i++ {
			if state.IsString(-1) && state.IsString(-2) {
				s2 := state.ToString(-1)
				s1 := state.ToString(-2)
				state.stack.pop()
				state.stack.pop()
				state.stack.push(s1 + s2)
				continue
			}

			b := state.stack.pop()
			a := state.stack.pop()
			if result, ok := callMetamethod(a, b, "__concat", state); ok {
				state.stack.push(result)
				continue
			}

			var typeName string
			if _, ok := convertToFloat(a); !ok {
				typeName = state.TypeName(typeOf(a))
			} else {
				typeName = state.TypeName(typeOf(b))
			}
			panic("attempt to concatenate a " + typeName + " value")
		}
	}
	// n == 1, do nothing
}

5.4 测试本章代码

Chapter 6, 虚拟机雏形

6.1 添加 LuaVM 接口

简单描述了 LuaVM 需要 PC 寄存器, 以及构成和执行方式.

6.1.1 定义 LuaVM 接口

LuaVM 接口:

type LuaVM interface {
	LuaState
	AddPC(n int)         // pc += n
	Fetch() uint32       // code[pc++]
	RegisterCount() int  // proto.MaxStackSize
	GetConst(idx int)    // push(const[idx])
	GetRK(rk int)        // rk > 0xFF ? GetConst(rk & 0xFF) : PushValue(rk + 1)
	LoadProto(idx int)   // push(proto[idx] as LuaFunction)
	LoadVararg(n int)    // push(vararg[0], ..., vararg[n-1])
	CloseUpvalues(a int) // close all upvalues >= R(A - 1)
}

即主要操作核心都是操作 PC 或者修改栈顶.

6.1.2 改造 luaState 结构体

type luaState struct {
	/* global state */
	hook     LuaHook
	hookMask int
	panicf   GoFunction
	registry *luaTable
	/* stack */
	stack     *luaStack
	callDepth int
	/* coroutine */
	coStatus ThreadStatus
	coCaller *luaState
	coChan   chan int
}

新增了 proto, 用于保存函数原型, pc 则是 pc 寄存器.

6.1.3 实现 LuaVM 接口

接下来实现刚才定义的 LuaVM 接口:

func (state *luaState) AddPC(n int) {
	state.stack.pc += n
}

func (state *luaState) Fetch() uint32 {
	i := state.stack.closure.proto.Code[state.stack.pc]
	state.stack.pc++
	return i
}

func (state *luaState) RegisterCount() int {
	return int(state.stack.closure.proto.MaxStackSize)
}

func (state *luaState) GetConst(idx int) {
	c := state.stack.closure.proto.Constants[idx]
	state.stack.push(c)
}

func (state *luaState) GetRK(rk int) {
	if rk > 0xFF { // constant
		state.GetConst(rk & 0xFF)
	} else { // register
		state.PushValue(rk + 1)
	}
}

AddPC() 和 Fetch() 就可以组成向前获取指令的方法.

GetConst() 从常量表取值并压入栈顶.

6.2 实现 Lua 虚拟机指令

6.2.1 移动和跳转指令

1. MOVE

move (iABC) 把源寄存器 (索引由操作数 B 指定) 里的值移动到目标寄存器 (索引由操作数A指定).

// R(A) := R(B)
func move(i Instruction, vm LuaVM) {
	a, b, _ := i.ABC()
	a += 1
	b += 1

	vm.Copy(b, a)
}

+1 将寄存器索引转换为栈索引.

Copy 将值拷贝到栈.

// [-0, +0, –]
// http://www.lua.org/manual/5.3/manual.html#lua_copy
// lua-5.3.4/src/lapi.c#lua_copy()
func (state *luaState) Copy(fromIdx, toIdx int) {
	val := state.stack.get(fromIdx)
	state.stack.set(toIdx, val)
}

由于局部变量在寄存器中, 因此最大局部变量数量不能超过 oprand A 的长度 8, 因此最大能容纳 255 个. 实际上被限制到了 200 个以内.

2. JMP

JMP (iAsBx) 执行无条件跳转.

// pc+=sBx; if (A) close all upvalues >= R(A - 1)
func jmp(i Instruction, vm LuaVM) {
	a, sBx := i.AsBx()

	vm.AddPC(sBx)
	if a != 0 {
		vm.CloseUpvalues(a)
	}
}
func (state *luaState) AddPC(n int) {
	state.stack.pc += n
}

6.2.2 加载指令

用于把 nil, 布尔, 常量加载到寄存器里.

1. LOADNIL

loadnil (iABC), 用于给连续 n 个进粗气放置 nil 值. 寄存器的起始索引由操作数 A 指定, 寄存器数量 B 指定.

// R(A), R(A+1), ..., R(A+B) := nil
func loadNil(i Instruction, vm LuaVM) {
	a, b, _ := i.ABC()
	a += 1

	//vm.CheckStack(1)
	vm.PushNil() // ~/nil
	for i := a; i <= a+b; i++ {
		vm.Copy(-1, i)
	}
	vm.Pop(1) // ~
}

实现方式是先将一个nil值压入栈顶, 然后拷贝覆盖, 最后再弹出压入的多余的值.

2. LOADBOOL

loadbool (iABC) 给单个寄存器设置布尔值. 寄存器索引由 A 指定, 布尔值由 B 指定. C 非 0 则跳过下一条指令.

// R(A) := (bool)B; if (C) pc++
func loadBool(i Instruction, vm LuaVM) {
	a, b, c := i.ABC()
	a += 1

	//vm.CheckStack(1)
	vm.PushBoolean(b != 0) // ~/b
	vm.Replace(a)          // ~

	if c != 0 {
		vm.AddPC(1)
	}
}

3. LOADK 和 LOADKX

LOADK (iABx) 将常量表里的某个常量加载到指定寄存器. 寄存器索引由操作数 A 指定, 常量表索引由 Bx 指定.

Lua 函数里出现的字面量 (数字和字符串) 会被 Lua 编译器收集起来, 放进常量表里面.

// R(A) := Kst(Bx)
func loadK(i Instruction, vm LuaVM) {
	a, bx := i.ABx()
	a += 1

	//vm.CheckStack(1)
	vm.GetConst(bx) // ~/k[bx]
	vm.Replace(a)   // ~
}

// R(A) := Kst(extra arg)
func loadKx(i Instruction, vm LuaVM) {
	a, _ := i.ABx()
	a += 1
	ax := Instruction(vm.Fetch()).Ax()

	//vm.CheckStack(1)
	vm.GetConst(ax) // ~/k[ax]
	vm.Replace(a)   // ~
}

6.2.3 算术运算指令

1. 二元算术运算指令

二元算术运算指令 iABC 模式, 对两个寄存器或常量值进行运算 (Oprand B, C), 将结果放入 Oprand A.

// R(A) := RK(B) op RK(C)
func _binaryArith(i Instruction, vm LuaVM, op ArithOp) {
	a, b, c := i.ABC()
	a += 1

	//vm.CheckStack(2)
	vm.GetRK(b)   // ~/rk[b]
	vm.GetRK(c)   // ~/rk[b]/rk[c]
	vm.Arith(op)  // ~/result
	vm.Replace(a) // ~
}

2. 一元算术运算指令

// R(A) := op R(B)
func _unaryArith(i Instruction, vm LuaVM, op ArithOp) {
	a, b, _ := i.ABC()
	a += 1
	b += 1

	//vm.CheckStack(1)
	vm.PushValue(b) // ~/r[b]
	vm.Arith(op)    // ~/result
	vm.Replace(a)   // ~
}

其他一元算术运算也均是这种模式.

6.2.4 长度和拼接指令

1. LEN

// R(A) := length of R(B)
func length(i Instruction, vm LuaVM) {
	a, b, _ := i.ABC()
	a += 1
	b += 1

	//vm.CheckStack(1)
	vm.Len(b)     // ~/#r[b]
	vm.Replace(a) // ~
}

2. CONCAT

// R(A) := R(B).. ... ..R(C)
func concat(i Instruction, vm LuaVM) {
	a, b, c := i.ABC()
	a += 1
	b += 1
	c += 1

	n := c - b + 1
	vm.CheckStack(n)
	for i := b; i <= c; i++ {
		vm.PushValue(i) // ~/r[b]/.../r[c]
	}
	vm.Concat(n)  // ~/result
	vm.Replace(a) // ~
}

这里的 B, C 为起止索引位置. 最终结果放入 A. 计算出的 n 即为拼接的栈长度. 检查 Stack 后, 调用 vm.Concat() 进行拼接. 另外还需要检查栈空间是否够用.

6.2.5 比较指令

iABC 模式. 可以比较寄存器或常量表里的两个值. 如果比较结果和 Oprand A (隐式转换为布尔值) 不匹配, 则跳过下一条指令. 即

if ((RK(B) op RK(C)) ~= A ) then pc++
// if ((RK(B) op RK(C)) ~= A) then pc++
func _compare(i Instruction, vm LuaVM, op CompareOp) {
	a, b, c := i.ABC()

	//vm.CheckStack(2)
	vm.GetRK(b) // ~/rk[b]
	vm.GetRK(c) // ~/rk[b]/rk[c]
	if vm.Compare(-2, -1, op) != (a != 0) {
		vm.AddPC(1)
	}
	vm.Pop(2) // ~
}

6.2.6 逻辑运算指令

1. NOT

iABC 模式.

// R(A) := not R(B)
func not(i Instruction, vm LuaVM) {
	a, b, _ := i.ABC()
	a += 1
	b += 1

	//vm.CheckStack(1)
	vm.PushBoolean(!vm.ToBoolean(b)) // ~/!r[b]
	vm.Replace(a)                    // ~
}

2. TESTSET

// if (R(B) <=> C) then R(A) := R(B) else pc++
func testSet(i Instruction, vm LuaVM) {
	a, b, c := i.ABC()
	a += 1
	b += 1

	if vm.ToBoolean(b) == (c != 0) {
		vm.Copy(b, a)
	} else {
		vm.AddPC(1)
	}
}

3. TEST

// if not (R(A) <=> C) then pc++
func test(i Instruction, vm LuaVM) {
	a, _, c := i.ABC()
	a += 1

	if vm.ToBoolean(a) != (c != 0) {
		vm.AddPC(1)
	}
}

6.2.7 for 循环指令

for 循环有 数值 和 统用 两种形式.数值 for 用于按一定步长遍历某个范围内的数值. 通用 for 则用于便利表.

// R(A)-=R(A+2); pc+=sBx
func forPrep(i Instruction, vm LuaVM) {
	a, sBx := i.AsBx()
	a += 1

	//vm.CheckStack(2)
	if vm.Type(a) == LUA_TSTRING {
		vm.PushNumber(vm.ToNumber(a))
		vm.Replace(a)
	}
	if vm.Type(a+1) == LUA_TSTRING {
		vm.PushNumber(vm.ToNumber(a + 1))
		vm.Replace(a + 1)
	}
	if vm.Type(a+2) == LUA_TSTRING {
		vm.PushNumber(vm.ToNumber(a + 2))
		vm.Replace(a + 2)
	}

	vm.PushValue(a)     // ~/r[a]
	vm.PushValue(a + 2) // ~/r[a]/r[a+2]
	vm.Arith(LUA_OPSUB) // ~/r[a]-r[a+2]
	vm.Replace(a)       // ~
	vm.AddPC(sBx)
}

// R(A)+=R(A+2);
// if R(A) <?= R(A+1) then {
//   pc+=sBx; R(A+3)=R(A)
// }
func forLoop(i Instruction, vm LuaVM) {
	a, sBx := i.AsBx()
	a += 1

	//vm.CheckStack(2)
	// R(A)+=R(A+2);
	vm.PushValue(a + 2) // ~/r[a+2]
	vm.PushValue(a)     // ~/r[a+2]/r[a]
	vm.Arith(LUA_OPADD) // ~/r[a]+r[a+2]
	vm.Replace(a)       // ~

	isPositiveStep := vm.ToNumber(a+2) >= 0
	if isPositiveStep && vm.Compare(a, a+1, LUA_OPLE) ||
		!isPositiveStep && vm.Compare(a+1, a, LUA_OPLE) {

		// pc+=sBx; R(A+3)=R(A)
		vm.AddPC(sBx)
		vm.Copy(a, a+3)
	}
}

FORPREP :

R(A) -= R(A+2);
pc += sBx

FORLOOP:

R(A)+=R(A+2);
if R(A) <?= R(A+1) then {
  pc+=sBx; R(A+3)=R(A)
}

6.3 指令分派 (Dispatch)

func (instr Instruction) Execute(vm api.LuaVM) {
	opcodes[instr.Opcode()].action(instr, vm)
}

6.4 测试本章代码

6.5 本章小结

Chapter 7, 表

7.1 表介绍

Lua 表本质上是关联数组, Associative Array, Dictionary, Map. 存放的是两两关联的键值对. 表有 Record 和 List (数字下标) 之分.

7.2 表内部实现

Lua 优化了作为数组和Map时的具体表现.

type luaTable struct {
	metatable *luaTable
	arr       []luaValue
	_map      map[luaValue]luaValue
	keys      map[luaValue]luaValue // used by next()
	lastKey   luaValue              // used by next()
	changed   bool                  // used by next()
}

arr 存放数组, _map 存放 Map. 然后是初始化函数.

func newLuaTable(nArr, nRec int) *luaTable {
	t := &luaTable{}
	if nArr > 0 {
		t.arr = make([]luaValue, 0, nArr)
	}
	if nRec > 0 {
		t._map = make(map[luaValue]luaValue, nRec)
	}
	return t
}

参数用于预估长度.

func (table *luaTable) get(key luaValue) luaValue {
	key = _floatToInteger(key)
	if idx, ok := key.(int64); ok {
		if idx >= 1 && idx <= int64(len(table.arr)) {
			return table.arr[idx-1]
		}
	}
	return table._map[key]
}

先转换成整数然后查看是否在范围内, 然后进行查找值.

func _floatToInteger(key luaValue) luaValue {
	if f, ok := key.(float64); ok {
		if i, ok := number.FloatToInteger(f); ok {
			return i
		}
	}
	return key
}
func (table *luaTable) put(key, val luaValue) {
	if key == nil {
		panic("table index is nil!")
	}
	if f, ok := key.(float64); ok && math.IsNaN(f) {
		panic("table index is NaN!")
	}

	table.changed = true
	key = _floatToInteger(key)
	if idx, ok := key.(int64); ok && idx >= 1 {
		arrLen := int64(len(table.arr))
		if idx <= arrLen {
			table.arr[idx-1] = val
			if idx == arrLen && val == nil {
				table._shrinkArray()
			}
			return
		}
		if idx == arrLen+1 {
			delete(table._map, key)
			if val != nil {
				table.arr = append(table.arr, val)
				table._expandArray()
			}
			return
		}
	}
	if val != nil {
		if table._map == nil {
			table._map = make(map[luaValue]luaValue, 8)
		}
		table._map[key] = val
	} else {
		delete(table._map, key)
	}
}

这里涉及了 Lua 表的特性.

7.3 表相关 API

Chapter 8, 函数调用

8.1 函数调用介绍

函数调用有一些特性:

8.2 函数调用栈

同样, 为了处理调用关系, 用了栈的结构对调用过程进行抽象.

8.2.1 调用帧实现

type luaStack struct {
	/* virtual stack */
	slots []luaValue
	top   int
	/* call info */
	state   *luaState
	closure *closure
	varargs []luaValue
	openuvs map[int]*upvalue
	pc      int
	/* linked list */
	prev *luaStack
}

新增了 prev, closure, varargs, pc.

然后定义 closure 结构体.

type closure struct {
	proto  *binchunk.Prototype // lua closure
	goFunc GoFunction          // go closure
	upvals []*upvalue
}

proto 用于存放函数原型.

8.2.2 调用栈实现

调用栈采用链表实现:

func (state *luaState) pushLuaStack(stack *luaStack) {
	stack.prev = state.stack
	state.stack = stack
	state.callDepth++
}

func (state *luaState) popLuaStack() {
	stack := state.stack
	state.stack = stack.prev
	stack.prev = nil
	state.callDepth--
}

8.3 函数调用 API

8.3.1 Load()

load() 方法加载二进制 chunk, 把主函数原型实例化为闭包并推入栈顶. 该方法同样可以加载 Lua 脚本.
解析流程为读取文件, 解析主函数原型, 实例化为闭包, 推入栈顶.

// [-0, +1, –]
// http://www.lua.org/manual/5.3/manual.html#lua_load
func (state *luaState) Load(chunk []byte, chunkName, mode string) (status ThreadStatus) {
	status = LUA_ERRSYNTAX

	// catch error
	defer func() {
		if r := recover(); r != nil {
			state.stack.push(_getErrObj(r))
		}
	}()

	var proto *binchunk.Prototype
	if binchunk.IsBinaryChunk(chunk) {
		if mode == "t" {
			panic("attempt to load a binary chunk (mode is '" + mode + "')")
		}
		proto = binchunk.Undump(chunk)
	} else {
		if mode == "b" {
			panic("attempt to load a text chunk (mode is '" + mode + "')")
		}
		proto = compiler.Compile(string(chunk), chunkName)
	}

	c := newLuaClosure(proto)
	if len(proto.Upvalues) > 0 {
		env := state.registry.get(LUA_RIDX_GLOBALS)
		c.upvals[0] = &upvalue{&env}
	}
	state.stack.push(c)
	status = LUA_OK
	return
}

8.3.2 Call()

将被调用函数推入栈顶, 接着推入函数参数, 即可调用 call(). 执行完毕后这些被弹出, 并将结果压入.

// [-(nargs+1), +nresults, e]
// http://www.lua.org/manual/5.3/manual.html#lua_call
func (state *luaState) Call(nArgs, nResults int) {
	val := state.stack.get(-(nArgs + 1))

	c, ok := val.(*closure)
	if !ok {
		if mf := getMetafield(val, "__call", state); mf != nil {
			if c, ok = mf.(*closure); ok {
				state.stack.push(val)
				state.Insert(-(nArgs + 2))
				nArgs += 1
			}
		}
	}

	if ok {
		if c.proto != nil {
			state.callLuaClosure(nArgs, nResults, c)
		} else {
			state.callGoClosure(nArgs, nResults, c)
		}
	} else {
		typeName := state.TypeName(typeOf(val))
		panic("attempt to call a " + typeName + " value")
	}
}

state.callLuaClosure() 则充当了函数的实际调用者.

func (state *luaState) callLuaClosure(nArgs, nResults int, c *closure) {
	nRegs := int(c.proto.MaxStackSize)
	nParams := int(c.proto.NumParams)
	isVararg := c.proto.IsVararg == 1

	// create new lua stack
	newStack := newLuaStack(nRegs+LUA_MINSTACK, state)
	newStack.closure = c

	// pass args, pop func
	funcAndArgs := state.stack.popN(nArgs + 1)
	newStack.pushN(funcAndArgs[1:], nParams)
	newStack.top = nRegs
	if nArgs > nParams && isVararg {
		newStack.varargs = funcAndArgs[nParams+1:]
	}

	// run closure
	state.pushLuaStack(newStack)
	state.runLuaClosure()
	state.popLuaStack()

	// return results
	if nResults != 0 {
		results := newStack.popN(newStack.top - nRegs)
		state.stack.check(len(results))
		state.stack.pushN(results, nResults)
	}
}

调用过程为创建新的 stack, 传入参数, 弹出目标函数, 运行 closure, 最后将结果放入 stack.

state.runLuaClosure() 为内部执行具体函数.

func (state *luaState) runLuaClosure() {
	for {
		inst := vm.Instruction(state.Fetch())
		inst.Execute(state)

		// indent := fmt.Sprintf("%%%ds", state.callDepth*2)
		// fmt.Printf(indent+"[%02d: %s] => %s\n",
		// 	"", pc+1, inst.OpName(), state)

		if inst.Opcode() == vm.OP_RETURN {
			break
		}
	}
}

这里的 state.Fetch() 根据 PC 获取当前指令, 然后 inst.Execute() 执行指令.

8.4 函数调用指令

8.4.1 closure

closure 指令, iBx 模式, 把当前函数的子函数原型实例化为闭包, 放入由操作数 A 指定的寄存器中.

// R(A) := closure(KPROTO[Bx])
func closure(i Instruction, vm LuaVM) {
	a, bx := i.ABx()
	a += 1

	//vm.CheckStack(1)
	vm.LoadProto(bx) // ~/closure
	vm.Replace(a)    // ~
}

8.4.1 CALL

call 指令, iABC 模式, 调用 Lua 函数.

// R(A), ... ,R(A+C-2) := R(A)(R(A+1), ... ,R(A+B-1))
func call(i Instruction, vm LuaVM) {
	a, b, c := i.ABC()
	a += 1

	// println(":::"+ vm.StackToString())
	nArgs := _pushFuncAndArgs(a, b, vm)
	vm.Call(nArgs, c-1)
	_popResults(a, c, vm)
}

8.4.3 return

return 指令, iABC 模式, 把存放在连续多个寄存器里的返回值返回给主调函数.

// return R(A), ... ,R(A+B-2)
func _return(i Instruction, vm LuaVM) {
	a, b, _ := i.ABC()
	a += 1

	if b == 1 {
		// no return values
	} else if b > 1 {
		// b-1 return values
		vm.CheckStack(b - 1)
		for i := a; i <= a+b-2; i++ {
			vm.PushValue(i)
		}
	} else {
		_fixStack(a, vm)
	}
}

其中第一个寄存器索引由操作数 A 指定, 寄存器数量由操作数 B 指定.

8.4.4 vararg

vararg 指令, iABC 模式, 把传递给当前函数的变长参数加载到连续多个寄存器中. 操作数 A 指定寄存器索引, 操作数 B 指定寄存器数量.

// R(A), R(A+1), ..., R(A+B-2) = vararg
func vararg(i Instruction, vm LuaVM) {
	a, b, _ := i.ABC()
	a += 1

	if b != 1 { // b==0 or b>1
		vm.LoadVararg(b - 1)
		_popResults(a, b, vm)
	}
}

8.4.5 tailcall

return f(args) 会被优化成尾递归, 即 tailcall 指令.

// return R(A)(R(A+1), ... ,R(A+B-1))
func tailCall(i Instruction, vm LuaVM) {
	a, b, _ := i.ABC()
	a += 1

	// todo: optimize tail call!
	c := 0
	nArgs := _pushFuncAndArgs(a, b, vm)
	vm.Call(nArgs, c-1)
	_popResults(a, c, vm)
}

8.4.6 self

self 指令, iABC 模式, 把对向和方法拷贝到相邻的两个目标寄存器中. 对象在寄存器中索引由操作数 B 指定. 方法名在常量表, 索引由操作数 C 指定. 目标寄存器索引由 A 指定.

// R(A+1) := R(B); R(A) := R(B)[RK(C)]
func self(i Instruction, vm LuaVM) {
	a, b, c := i.ABC()
	a += 1
	b += 1

	vm.Copy(b, a+1)

	//vm.CheckStack(1)
	vm.GetRK(c)    // ~/rk[c]
	vm.GetTable(b) // ~/r[b][rk[c]]
	vm.Replace(a)  // ~
}

8.4.7 扩展 LuaVM 接口

8.4.8 改进 SETLIST指令

8.5 测试本章代码

8.6 本章小结

Chapter 14, 词法分析

14.1 编译器介绍

14.2 Lua 词法介绍

14.3 实现词法分析器

Lexer 结构体:

type Lexer struct {
	chunk         string // source code
	chunkName     string // source name
	line          int    // current line number
	nextToken     string
	nextTokenKind int
	nextTokenLine int
}

定义 NextToken():

func (lexer *Lexer) NextToken() (line, kind int, token string) {
	if lexer.nextTokenLine > 0 {
		line = lexer.nextTokenLine
		kind = lexer.nextTokenKind
		token = lexer.nextToken
		lexer.line = lexer.nextTokenLine
		lexer.nextTokenLine = 0
		return
	}

	lexer.skipWhiteSpaces()
	if len(lexer.chunk) == 0 {
		return lexer.line, TOKEN_EOF, "EOF"
	}

	switch lexer.chunk[0] {
	case ';':
		lexer.next(1)
		return lexer.line, TOKEN_SEP_SEMI, ";"
	case ',':
		lexer.next(1)
		return lexer.line, TOKEN_SEP_COMMA, ","
	case '(':
		lexer.next(1)
		return lexer.line, TOKEN_SEP_LPAREN, "("
	case ')':
		lexer.next(1)
		return lexer.line, TOKEN_SEP_RPAREN, ")"
	case ']':
		lexer.next(1)
		return lexer.line, TOKEN_SEP_RBRACK, "]"
	case '{':
		lexer.next(1)
		return lexer.line, TOKEN_SEP_LCURLY, "{"
	case '}':
		lexer.next(1)
		return lexer.line, TOKEN_SEP_RCURLY, "}"
	case '+':
		lexer.next(1)
		return lexer.line, TOKEN_OP_ADD, "+"
	case '-':
		lexer.next(1)
		return lexer.line, TOKEN_OP_MINUS, "-"
	case '*':
		lexer.next(1)
		return lexer.line, TOKEN_OP_MUL, "*"
	case '^':
		lexer.next(1)
		return lexer.line, TOKEN_OP_POW, "^"
	case '%':
		lexer.next(1)
		return lexer.line, TOKEN_OP_MOD, "%"
	case '&':
		lexer.next(1)
		return lexer.line, TOKEN_OP_BAND, "&"
	case '|':
		lexer.next(1)
		return lexer.line, TOKEN_OP_BOR, "|"
	case '#':
		lexer.next(1)
		return lexer.line, TOKEN_OP_LEN, "#"
	case ':':
		if lexer.test("::") {
			lexer.next(2)
			return lexer.line, TOKEN_SEP_LABEL, "::"
		} else {
			lexer.next(1)
			return lexer.line, TOKEN_SEP_COLON, ":"
		}
	case '/':
		if lexer.test("//") {
			lexer.next(2)
			return lexer.line, TOKEN_OP_IDIV, "//"
		} else {
			lexer.next(1)
			return lexer.line, TOKEN_OP_DIV, "/"
		}
	case '~':
		if lexer.test("~=") {
			lexer.next(2)
			return lexer.line, TOKEN_OP_NE, "~="
		} else {
			lexer.next(1)
			return lexer.line, TOKEN_OP_WAVE, "~"
		}
	case '=':
		if lexer.test("==") {
			lexer.next(2)
			return lexer.line, TOKEN_OP_EQ, "=="
		} else {
			lexer.next(1)
			return lexer.line, TOKEN_OP_ASSIGN, "="
		}
	case '<':
		if lexer.test("<<") {
			lexer.next(2)
			return lexer.line, TOKEN_OP_SHL, "<<"
		} else if lexer.test("<=") {
			lexer.next(2)
			return lexer.line, TOKEN_OP_LE, "<="
		} else {
			lexer.next(1)
			return lexer.line, TOKEN_OP_LT, "<"
		}
	case '>':
		if lexer.test(">>") {
			lexer.next(2)
			return lexer.line, TOKEN_OP_SHR, ">>"
		} else if lexer.test(">=") {
			lexer.next(2)
			return lexer.line, TOKEN_OP_GE, ">="
		} else {
			lexer.next(1)
			return lexer.line, TOKEN_OP_GT, ">"
		}
	case '.':
		if lexer.test("...") {
			lexer.next(3)
			return lexer.line, TOKEN_VARARG, "..."
		} else if lexer.test("..") {
			lexer.next(2)
			return lexer.line, TOKEN_OP_CONCAT, ".."
		} else if len(lexer.chunk) == 1 || !isDigit(lexer.chunk[1]) {
			lexer.next(1)
			return lexer.line, TOKEN_SEP_DOT, "."
		}
	case '[':
		if lexer.test("[[") || lexer.test("[=") {
			return lexer.line, TOKEN_STRING, lexer.scanLongString()
		} else {
			lexer.next(1)
			return lexer.line, TOKEN_SEP_LBRACK, "["
		}
	case '\'', '"':
		return lexer.line, TOKEN_STRING, lexer.scanShortString()
	}

	c := lexer.chunk[0]
	if c == '.' || isDigit(c) {
		token := lexer.scanNumber()
		return lexer.line, TOKEN_NUMBER, token
	}
	if c == '_' || isLetter(c) {
		token := lexer.scanIdentifier()
		if kind, found := keywords[token]; found {
			return lexer.line, kind, token // keyword
		} else {
			return lexer.line, TOKEN_IDENTIFIER, token
		}
	}

	lexer.error("unexpected symbol near %q", c)
	return
}

lexer.next(n) 用于跳过指定个数的字符. 检测到是指定的字符或字符序列后返回 token 类型.

func (lexer *Lexer) test(s string) bool {
	return strings.HasPrefix(lexer.chunk, s)
}

14.3.1 定义 Token 类型

14.3.2 空白字符

func (lexer *Lexer) skipWhiteSpaces() {
	for len(lexer.chunk) > 0 {
		if lexer.test("--") {
			lexer.skipComment()
		} else if lexer.test("\r\n") || lexer.test("\n\r") {
			lexer.next(2)
			lexer.line += 1
		} else if isNewLine(lexer.chunk[0]) {
			lexer.next(1)
			lexer.line += 1
		} else if isWhiteSpace(lexer.chunk[0]) {
			lexer.next(1)
		} else {
			break
		}
	}
}

跳过空白和注释.

14.3.3 注释

func (lexer *Lexer) skipComment() {
	lexer.next(2) // skip --

	// long comment ?
	if lexer.test("[") {
		if reOpeningLongBracket.FindString(lexer.chunk) != "" {
			lexer.scanLongString()
			return
		}
	}

	// short comment
	for len(lexer.chunk) > 0 && !isNewLine(lexer.chunk[0]) {
		lexer.next(1)
	}
}

14.3.4 分隔符和运算符

14.3.5 长字符串字面量

func (lexer *Lexer) scanLongString() string {
	openingLongBracket := reOpeningLongBracket.FindString(lexer.chunk)
	if openingLongBracket == "" {
		lexer.error("invalid long string delimiter near '%s'",
			lexer.chunk[0:2])
	}

	closingLongBracket := strings.Replace(openingLongBracket, "[", "]", -1)
	closingLongBracketIdx := strings.Index(lexer.chunk, closingLongBracket)
	if closingLongBracketIdx < 0 {
		lexer.error("unfinished long string or comment")
	}

	str := lexer.chunk[len(openingLongBracket):closingLongBracketIdx]
	lexer.next(closingLongBracketIdx + len(closingLongBracket))

	str = reNewLine.ReplaceAllString(str, "\n")
	lexer.line += strings.Count(str, "\n")
	if len(str) > 0 && str[0] == '\n' {
		str = str[1:]
	}

	return str
}

即可以换行的字符串字面量的提取.

14.3.6 短字符串字面量

var reShortStr = regexp.MustCompile(`(?s)(^'(\\\\|\\'|\\\n|\\z\s*|[^'\n])*')|(^"(\\\\|\\"|\\\n|\\z\s*|[^"\n])*")`)

func (lexer *Lexer) scanShortString() string {
	if str := reShortStr.FindString(lexer.chunk); str != "" {
		lexer.next(len(str))
		str = str[1 : len(str)-1]
		if strings.Index(str, `\`) >= 0 {
			lexer.line += len(reNewLine.FindAllString(str, -1))
			str = lexer.escape(str)
		}
		return str
	}
	lexer.error("unfinished string")
	return ""
}

用正则匹配短字符串字面量.

func (lexer *Lexer) escape(str string) string {
	var buf bytes.Buffer

	for len(str) > 0 {
		if str[0] != '\\' {
			buf.WriteByte(str[0])
			str = str[1:]
			continue
		}

		if len(str) == 1 {
			lexer.error("unfinished string")
		}

		switch str[1] {
		case 'a':
			buf.WriteByte('\a')
			str = str[2:]
			continue
		case 'b':
			buf.WriteByte('\b')
			str = str[2:]
			continue
		case 'f':
			buf.WriteByte('\f')
			str = str[2:]
			continue
		case 'n', '\n':
			buf.WriteByte('\n')
			str = str[2:]
			continue
		case 'r':
			buf.WriteByte('\r')
			str = str[2:]
			continue
		case 't':
			buf.WriteByte('\t')
			str = str[2:]
			continue
		case 'v':
			buf.WriteByte('\v')
			str = str[2:]
			continue
		case '"':
			buf.WriteByte('"')
			str = str[2:]
			continue
		case '\'':
			buf.WriteByte('\'')
			str = str[2:]
			continue
		case '\\':
			buf.WriteByte('\\')
			str = str[2:]
			continue
		case '0', '1', '2', '3', '4', '5', '6', '7', '8', '9': // \ddd
			if found := reDecEscapeSeq.FindString(str); found != "" {
				d, _ := strconv.ParseInt(found[1:], 10, 32)
				if d <= 0xFF {
					buf.WriteByte(byte(d))
					str = str[len(found):]
					continue
				}
				lexer.error("decimal escape too large near '%s'", found)
			}
		case 'x': // \xXX
			if found := reHexEscapeSeq.FindString(str); found != "" {
				d, _ := strconv.ParseInt(found[2:], 16, 32)
				buf.WriteByte(byte(d))
				str = str[len(found):]
				continue
			}
		case 'u': // \u{XXX}
			if found := reUnicodeEscapeSeq.FindString(str); found != "" {
				d, err := strconv.ParseInt(found[3:len(found)-1], 16, 32)
				if err == nil && d <= 0x10FFFF {
					buf.WriteRune(rune(d))
					str = str[len(found):]
					continue
				}
				lexer.error("UTF-8 value too large near '%s'", found)
			}
		case 'z':
			str = str[2:]
			for len(str) > 0 && isWhiteSpace(str[0]) { // todo
				str = str[1:]
			}
			continue
		}
		lexer.error("invalid escape sequence near '\\%c'", str[1])
	}

	return buf.String()
}

14.3.7 数字字面量

同样, 用正则表达式完成.

14.3.8 标识符和关键字

14.4 LookAhead() 和其他方法

不跳过并判断下一个 token 是什么类型.

func (lexer *Lexer) LookAhead() int {
	if lexer.nextTokenLine > 0 {
		return lexer.nextTokenKind
	}
	currentLine := lexer.line
	line, kind, token := lexer.NextToken()
	lexer.line = currentLine
	lexer.nextTokenLine = line
	lexer.nextTokenKind = kind
	lexer.nextToken = token
	return kind
}

Chapter 15, 抽象语法树

15.1 抽象语法树介绍

通常用 EBNF (扩展巴科斯范式, Extended Backus-Naur Form) 描述.

15.2 Chunk 和块

chunk ::= block

block ::= {stat} [retstat]
retstat ::= return [explist] [';']
explist ::= exp {',' exp}

{A} 表示 0 或多次, [A] 表示 0 或 1 次.

Block 结构体:

package ast

// chunk ::= block
// type Chunk *Block

// block ::= {stat} [retstat]
// retstat ::= return [explist] [‘;’]
// explist ::= exp {‘,’ exp}
type Block struct {
	LastLine int
	Stats    []Stat
	RetExps  []Exp
}

15.3 语句

在命令式编程语言里, 语句 (statement) 的最基本的执行单位, 表达式 (expression) 则是构成语句的要素之一. 语句只能执行不能用于求值, 表达式只能用于求值不能单独执行.

stat ::= ';'
    | varlist '=' explist
    | functioncall
    | label
    | break
    | goto Name
    | do block end
    | while exp do block end
    | repeat block until end
    | if exp then block {elseif exp then block} [else block] end
    | for Name '=' exp ',' exp [',' exp] do block end
    | for namelist in explist do block end
    | function funcname funcbody
    | local function Name funcbody
    | local namelist ['=' explist]

15.3.1 简单语句

type EmptyStat struct{}            // ‘;’
type BreakStat struct{ Line int }  // break
type DoStat struct{ Block *Block } // do block end
type FuncCallStat = FuncCallExp    // functioncall


15.3.2 while 和 repeat 语句

while exp do block end
repeat block until exp
// while exp do block end
type WhileStat struct {
	Exp   Exp
	Block *Block
}

// repeat block until exp
type RepeatStat struct {
	Block *Block
	Exp   Exp
}

本是上都是包含一个 exp 和一个 block.

15.3.3 if 语句

if exp then block {elseif exp then block} [else block] end

为了简化需要改写成:

if exp then block {elseif exp then block} [elseif true then block] end

如果把 elseif 块合并, EBNF 可以简化为

if exp then block {elseif exp then block} end

这个转化会在语法分析阶段进行.

// if exp then block {elseif exp then block} [else block] end
type IfStat struct {
	Exps   []Exp
	Blocks []*Block
}

15.3.4 数值 for 循环语句

for Name '=' exp ',' exp [',' exp] do block end
// for Name ‘=’ exp ‘,’ exp [‘,’ exp] do block end
type ForNumStat struct {
	LineOfFor int
	LineOfDo  int
	VarName   string
	InitExp   Exp
	LimitExp  Exp
	StepExp   Exp
	Block     *Block
}

15.3.5 通用 for 循环语句

for namelist in explist do block end
namelist ::= Name {',' Name}
explist ::= exp {',' exp}
// for namelist in explist do block end
// namelist ::= Name {‘,’ Name}
// explist ::= exp {‘,’ exp}
type ForInStat struct {
	LineOfDo int
	NameList []string
	ExpList  []Exp
	Block    *Block
}


记录行号基本都是为了代码生成.

15.3.6 局部变量声明语句

local namelist ['=' explist]
namelist ::= Name {',' Name}
explist ::= exp {',' exp}
// local namelist [‘=’ explist]
// namelist ::= Name {‘,’ Name}
// explist ::= exp {‘,’ exp}
type LocalVarDeclStat struct {
	LastLine int
	NameList []string
	ExpList  []Exp
}

15.3.7 赋值语句

varlist ‘=’ explist
varlist ::= var {‘,’ var}
var ::=  Name | prefixexp ‘[’ exp ‘]’ | prefixexp ‘.’ Name
// varlist ‘=’ explist
// varlist ::= var {‘,’ var}
// var ::=  Name | prefixexp ‘[’ exp ‘]’ | prefixexp ‘.’ Name
type AssignStat struct {
	LastLine int
	VarList  []Exp
	ExpList  []Exp
}

15.3.8 非局部函数定义语句

function ::= function funcbody
funcname ::= Name {`.´ Name} [`:´ Name]
funcbody ::= `(´ [parlist] `)´ block end

15.3.9 局部函数定义语句

local function Name funcbody 

local function f (params) body end 会被转化为 local f; f = function (params) body end

// local function Name funcbody
type LocalFuncDefStat struct {
	Name string
	Exp  *FuncDefExp
}

15.4 表达式

exp ::=  nil | false | true | Number | String | `...´ | function | prefixexp | tableconstructor | exp binop exp | unop exp 

15.4.1 简单表达式

type NilExp struct{ Line int }    // nil
type TrueExp struct{ Line int }   // true
type FalseExp struct{ Line int }  // false
type VarargExp struct{ Line int } // ...

// Numeral
type IntegerExp struct {
	Line int
	Val  int64
}
type FloatExp struct {
	Line int
	Val  float64
}

// LiteralString
type StringExp struct {
	Line int
	Str  string
}

基本都包含行号, 有值的包含值.

15.4.2 运算符表达式

// unop exp
type UnopExp struct {
	Line int // line of operator
	Op   int // operator
	Exp  Exp
}

// exp1 op exp2
type BinopExp struct {
	Line int // line of operator
	Op   int // operator
	Exp1 Exp
	Exp2 Exp
}

type ConcatExp struct {
	Line int // line of last ..
	Exps []Exp
}

15.4.3 表构造表达式

// tableconstructor ::= ‘{’ [fieldlist] ‘}’
// fieldlist ::= field {fieldsep field} [fieldsep]
// field ::= ‘[’ exp ‘]’ ‘=’ exp | Name ‘=’ exp | exp
// fieldsep ::= ‘,’ | ‘;’
type TableConstructorExp struct {
	Line     int // line of `{` ?
	LastLine int // line of `}`
	KeyExps  []Exp
	ValExps  []Exp
}

15.4.4 函数定义表达式

// functiondef ::= function funcbody
// funcbody ::= ‘(’ [parlist] ‘)’ block end
// parlist ::= namelist [‘,’ ‘...’] | ‘...’
// namelist ::= Name {‘,’ Name}
type FuncDefExp struct {
	Line     int
	LastLine int // line of `end`
	ParList  []string
	IsVararg bool
	Block    *Block
}

15.4.5 前缀表达式

/*
prefixexp ::= Name |
              ‘(’ exp ‘)’ |
              prefixexp ‘[’ exp ‘]’ |
              prefixexp ‘.’ Name |
              prefixexp ‘:’ Name args |
              prefixexp args
*/

15.4.6 圆括号表达式

type ParensExp struct {
	Exp Exp
}

15.4.7 表访问表达式

type TableAccessExp struct {
	LastLine  int // line of `]` ?
	PrefixExp Exp
	KeyExp    Exp
}

15.4.8 函数调用表达式

functioncall ::=  prefixexp args | prefixexp `:´ Name args 
args ::=  `(´ [explist] `)´ | tableconstructor | String 

type FuncCallExp struct {
	Line      int // line of `(` ?
	LastLine  int // line of ')'
	PrefixExp Exp
	NameExp   *StringExp
	Args      []Exp
}

15.5 本章小结

这里是 lua 语法的 EBNF 汇总:

chunk = {stat [`;´]} [laststat [`;´]].

block ::= chunk

stat ::=  varlist `=´ explist | 
     functioncall | 
     do block end | 
     while exp do block end | 
     repeat block until exp | 
     if exp then block {elseif exp then block} [else block] end | 
     for Name `=´ exp `,´ exp [`,´ exp] do block end | 
     for namelist in explist do block end | 
     function funcname funcbody | 
     local function Name funcbody | 
     local namelist [`=´ explist] 

laststat ::= return [explist] | break

funcname ::= Name {`.´ Name} [`:´ Name]

varlist ::= var {`,´ var}

var ::=  Name | prefixexp `[´ exp `]´ | prefixexp `.´ Name 

namelist ::= Name {`,´ Name}

explist ::= {exp `,´} exp

exp ::=  nil | false | true | Number | String | `...´ | function | prefixexp | tableconstructor | exp binop exp | unop exp 

prefixexp ::= var | functioncall | `(´ exp `)´

functioncall ::=  prefixexp args | prefixexp `:´ Name args 

args ::=  `(´ [explist] `)´ | tableconstructor | String 

function ::= function funcbody

funcbody ::= `(´ [parlist] `)´ block end

parlist ::= namelist [`,´ `...´] | `...´

tableconstructor ::= `{´ [fieldlist] `}´

fieldlist ::= field {fieldsep field} [fieldsep]

field ::= `[´ exp `]´ `=´ exp | Name `=´ exp | exp

fieldsep ::= `,´ | `;´

binop ::= `+´ | `-´ | `*´ | `/´ | `^´ | `%´ | `..´ | `<´ | `<=´ | `>´ | `>=´ | `==´ | `~=´ | and | or

unop ::= `-´ | not | `#´

原始文档在: http://www.lua.org/manual/5.1/manual.html#8

Chapter 16, 语法分析

16.1 语法分析介绍

16.1.1 歧义

讲了 C 语言 EBNF 描述中 if else 嵌套情况下的歧义问题. C 语言是通过定义 else 与最近的 if 结合来避免这个问题的.

Lua 二元运算符则是通过运算优先级解决歧义的.

16.1.2 前瞻和回溯

如果上下文无关语言 L 不需要借助回溯就可以完成解析, 那么成为确定性 (Deterministic) 语言. 确定性语言一定没有歧义. 上下文无关包含无歧义包含确定性.

16.1.3 解析方式

使用自顶向下 (Top-Down) 方法与使用自底向上 (Bottom-Up) 都可以构造语法树. 自底向上包括 LR 解析器和 CYK 解析器等. 自顶向下解析器包括 LL 解析器和***递归下降 (Recursive Descent Parser) 解析器***

16.2 解析块

Lua 脚本实际上就是个代码块, 解析结果应该是一个 Block 结构体实例.

// block ::= {stat} [retstat]
func parseBlock(lexer *Lexer) *Block {
	return &Block{
		Stats:    parseStats(lexer),
		RetExps:  parseRetExps(lexer),
		LastLine: lexer.Line(),
	}
}

parseStats() 解析语句序列, parseRetExps() 解析可选的返回语句, 并记录末尾行号.

func parseStats(lexer *Lexer) []Stat {
	stats := make([]Stat, 0, 8)
	for !_isReturnOrBlockEnd(lexer.LookAhead()) {
		stat := parseStat(lexer)
		if _, ok := stat.(*EmptyStat); !ok {
			stats = append(stats, stat)
		}
	}
	return stats
}

func _isReturnOrBlockEnd(tokenKind int) bool {
	switch tokenKind {
	case TOKEN_KW_RETURN, TOKEN_EOF, TOKEN_KW_END,
		TOKEN_KW_ELSE, TOKEN_KW_ELSEIF, TOKEN_KW_UNTIL:
		return true
	}
	return false
}

_isReturnOrBlockEnd 确定了 block 的结束.

// retstat ::= return [explist] [‘;’]
// explist ::= exp {‘,’ exp}
func parseRetExps(lexer *Lexer) []Exp {
	if lexer.LookAhead() != TOKEN_KW_RETURN {
		return nil
	}

	lexer.NextToken()
	switch lexer.LookAhead() {
	case TOKEN_EOF, TOKEN_KW_END,
		TOKEN_KW_ELSE, TOKEN_KW_ELSEIF, TOKEN_KW_UNTIL:
		return []Exp{}
	case TOKEN_SEP_SEMI:
		lexer.NextToken()
		return []Exp{}
	default:
		exps := parseExpList(lexer)
		if lexer.LookAhead() == TOKEN_SEP_SEMI {
			lexer.NextToken()
		}
		return exps
	}
}

接下来是表达式的 parser

// explist ::= exp {‘,’ exp}
func parseExpList(lexer *Lexer) []Exp {
	exps := make([]Exp, 0, 4)
	exps = append(exps, parseExp(lexer))
	for lexer.LookAhead() == TOKEN_SEP_COMMA {
		lexer.NextToken()
		exps = append(exps, parseExp(lexer))
	}
	return exps
}

16.3 解析语句

stat ::=  varlist `=´ explist | 
     functioncall | 
     do block end | 
     while exp do block end | 
     repeat block until exp | 
     if exp then block {elseif exp then block} [else block] end | 
     for Name `=´ exp `,´ exp [`,´ exp] do block end | 
     for namelist in explist do block end | 
     function funcname funcbody | 
     local function Name funcbody | 
     local namelist [`=´ explist] 

这里可以看到需要大量前瞻.

/*
stat ::=  ‘;’
	| break
	| ‘::’ Name ‘::’
	| goto Name
	| do block end
	| while exp do block end
	| repeat block until exp
	| if exp then block {elseif exp then block} [else block] end
	| for Name ‘=’ exp ‘,’ exp [‘,’ exp] do block end
	| for namelist in explist do block end
	| function funcname funcbody
	| local function Name funcbody
	| local namelist [‘=’ explist]
	| varlist ‘=’ explist
	| functioncall
*/
func parseStat(lexer *Lexer) Stat {
	switch lexer.LookAhead() {
	case TOKEN_SEP_SEMI:
		return parseEmptyStat(lexer)
	case TOKEN_KW_BREAK:
		return parseBreakStat(lexer)
	case TOKEN_SEP_LABEL:
		return parseLabelStat(lexer)
	case TOKEN_KW_GOTO:
		return parseGotoStat(lexer)
	case TOKEN_KW_DO:
		return parseDoStat(lexer)
	case TOKEN_KW_WHILE:
		return parseWhileStat(lexer)
	case TOKEN_KW_REPEAT:
		return parseRepeatStat(lexer)
	case TOKEN_KW_IF:
		return parseIfStat(lexer)
	case TOKEN_KW_FOR:
		return parseForStat(lexer)
	case TOKEN_KW_FUNCTION:
		return parseFuncDefStat(lexer)
	case TOKEN_KW_LOCAL:
		return parseLocalAssignOrFuncDefStat(lexer)
	default:
		return parseAssignOrFuncCallStat(lexer)
	}
}

即进行一次前瞻, 然后尝试进行匹配.

16.3.1 简单语句

包括 空语句, break, label goto, do, while, repeat.

// ;
func parseEmptyStat(lexer *Lexer) *EmptyStat {
	lexer.NextTokenOfKind(TOKEN_SEP_SEMI)
	return _statEmpty
}

// break
func parseBreakStat(lexer *Lexer) *BreakStat {
	lexer.NextTokenOfKind(TOKEN_KW_BREAK)
	return &BreakStat{lexer.Line()}
}

// ‘::’ Name ‘::’
func parseLabelStat(lexer *Lexer) *LabelStat {
	lexer.NextTokenOfKind(TOKEN_SEP_LABEL) // ::
	line, name := lexer.NextIdentifier()   // name
	lexer.NextTokenOfKind(TOKEN_SEP_LABEL) // ::
	return &LabelStat{line, name}
}

// goto Name
func parseGotoStat(lexer *Lexer) *GotoStat {
	lexer.NextTokenOfKind(TOKEN_KW_GOTO) // goto
	line, name := lexer.NextIdentifier() // name
	return &GotoStat{line, name}
}

// do block end
func parseDoStat(lexer *Lexer) *DoStat {
	lexer.NextTokenOfKind(TOKEN_KW_DO)  // do
	block := parseBlock(lexer)          // block
	lexer.NextTokenOfKind(TOKEN_KW_END) // end
	return &DoStat{block}
}

// while exp do block end
func parseWhileStat(lexer *Lexer) *WhileStat {
	lexer.NextTokenOfKind(TOKEN_KW_WHILE) // while
	exp := parseExp(lexer)                // exp
	lexer.NextTokenOfKind(TOKEN_KW_DO)    // do
	block := parseBlock(lexer)            // block
	lexer.NextTokenOfKind(TOKEN_KW_END)   // end
	return &WhileStat{exp, block}
}

// repeat block until exp
func parseRepeatStat(lexer *Lexer) *RepeatStat {
	lexer.NextTokenOfKind(TOKEN_KW_REPEAT) // repeat
	block := parseBlock(lexer)             // block
	lexer.NextTokenOfKind(TOKEN_KW_UNTIL)  // until
	exp := parseExp(lexer)                 // exp
	return &RepeatStat{block, exp}
}

具体处理逻辑.

这里的 parseBlock() 与 parseDoStat() 互相调用, 因此被称作递归下降.

16.3.2 if 语句

// if exp then block {elseif exp then block} [else block] end
type IfStat struct {
	Exps   []Exp
	Blocks []*Block
}

// if exp then block {elseif exp then block} [else block] end
func parseIfStat(lexer *Lexer) *IfStat {
	exps := make([]Exp, 0, 4)
	blocks := make([]*Block, 0, 4)

	lexer.NextTokenOfKind(TOKEN_KW_IF)         // if
	exps = append(exps, parseExp(lexer))       // exp
	lexer.NextTokenOfKind(TOKEN_KW_THEN)       // then
	blocks = append(blocks, parseBlock(lexer)) // block

	for lexer.LookAhead() == TOKEN_KW_ELSEIF {
		lexer.NextToken()                          // elseif
		exps = append(exps, parseExp(lexer))       // exp
		lexer.NextTokenOfKind(TOKEN_KW_THEN)       // then
		blocks = append(blocks, parseBlock(lexer)) // block
	}

	// else block => elseif true then block
	if lexer.LookAhead() == TOKEN_KW_ELSE {
		lexer.NextToken()                           // else
		exps = append(exps, &TrueExp{lexer.Line()}) //
		blocks = append(blocks, parseBlock(lexer))  // block
	}

	lexer.NextTokenOfKind(TOKEN_KW_END) // end
	return &IfStat{exps, blocks}
}

源码写的很流畅.

16.3.3 for 循环语句

// for Name ‘=’ exp ‘,’ exp [‘,’ exp] do block end
// for namelist in explist do block end
func parseForStat(lexer *Lexer) Stat {
	lineOfFor, _ := lexer.NextTokenOfKind(TOKEN_KW_FOR)
	_, name := lexer.NextIdentifier()
	if lexer.LookAhead() == TOKEN_OP_ASSIGN {
		return _finishForNumStat(lexer, lineOfFor, name)
	} else {
		return _finishForInStat(lexer, name)
	}
}

// for Name ‘=’ exp ‘,’ exp [‘,’ exp] do block end
func _finishForNumStat(lexer *Lexer, lineOfFor int, varName string) *ForNumStat {
	lexer.NextTokenOfKind(TOKEN_OP_ASSIGN) // for name =
	initExp := parseExp(lexer)             // exp
	lexer.NextTokenOfKind(TOKEN_SEP_COMMA) // ,
	limitExp := parseExp(lexer)            // exp

	var stepExp Exp
	if lexer.LookAhead() == TOKEN_SEP_COMMA {
		lexer.NextToken()         // ,
		stepExp = parseExp(lexer) // exp
	} else {
		stepExp = &IntegerExp{lexer.Line(), 1}
	}

	lineOfDo, _ := lexer.NextTokenOfKind(TOKEN_KW_DO) // do
	block := parseBlock(lexer)                        // block
	lexer.NextTokenOfKind(TOKEN_KW_END)               // end

	return &ForNumStat{lineOfFor, lineOfDo,
		varName, initExp, limitExp, stepExp, block}
}

// for namelist in explist do block end
// namelist ::= Name {‘,’ Name}
// explist ::= exp {‘,’ exp}
func _finishForInStat(lexer *Lexer, name0 string) *ForInStat {
	nameList := _finishNameList(lexer, name0)         // for namelist
	lexer.NextTokenOfKind(TOKEN_KW_IN)                // in
	expList := parseExpList(lexer)                    // explist
	lineOfDo, _ := lexer.NextTokenOfKind(TOKEN_KW_DO) // do
	block := parseBlock(lexer)                        // block
	lexer.NextTokenOfKind(TOKEN_KW_END)               // end
	return &ForInStat{lineOfDo, nameList, expList, block}
}

// namelist ::= Name {‘,’ Name}
func _finishNameList(lexer *Lexer, name0 string) []string {
	names := []string{name0}
	for lexer.LookAhead() == TOKEN_SEP_COMMA {
		lexer.NextToken()                 // ,
		_, name := lexer.NextIdentifier() // Name
		names = append(names, name)
	}
	return names
}

16.3.4 局部变量声明和函数定义语句

即 local 语句.

// local function Name funcbody
// local namelist [‘=’ explist]
func parseLocalAssignOrFuncDefStat(lexer *Lexer) Stat {
	lexer.NextTokenOfKind(TOKEN_KW_LOCAL)
	if lexer.LookAhead() == TOKEN_KW_FUNCTION {
		return _finishLocalFuncDefStat(lexer)
	} else {
		return _finishLocalVarDeclStat(lexer)
	}
}

即跳过 local 查看前面是 function 则是初始化局部函数, 如果是变量则初始化局部变量.

/*
http://www.lua.org/manual/5.3/manual.html#3.4.11

function f() end          =>  f = function() end
function t.a.b.c.f() end  =>  t.a.b.c.f = function() end
function t.a.b.c:f() end  =>  t.a.b.c.f = function(self) end
local function f() end    =>  local f; f = function() end

The statement `local function f () body end`
translates to `local f; f = function () body end`
not to `local f = function () body end`
(This only makes a difference when the body of the function
 contains references to f.)
*/
// local function Name funcbody
func _finishLocalFuncDefStat(lexer *Lexer) *LocalFuncDefStat {
	lexer.NextTokenOfKind(TOKEN_KW_FUNCTION) // local function
	_, name := lexer.NextIdentifier()        // name
	fdExp := parseFuncDefExp(lexer)          // funcbody
	return &LocalFuncDefStat{name, fdExp}
}

// local namelist [‘=’ explist]
func _finishLocalVarDeclStat(lexer *Lexer) *LocalVarDeclStat {
	_, name0 := lexer.NextIdentifier()        // local Name
	nameList := _finishNameList(lexer, name0) // { , Name }
	var expList []Exp = nil
	if lexer.LookAhead() == TOKEN_OP_ASSIGN {
		lexer.NextToken()             // ==
		expList = parseExpList(lexer) // explist
	}
	lastLine := lexer.Line()
	return &LocalVarDeclStat{lastLine, nameList, expList}
}

16.3.5 赋值和函数调用语句

// varlist ‘=’ explist
// functioncall
func parseAssignOrFuncCallStat(lexer *Lexer) Stat {
	prefixExp := parsePrefixExp(lexer)
	if fc, ok := prefixExp.(*FuncCallExp); ok {
		return fc
	} else {
		return parseAssignStat(lexer, prefixExp)
	}
}
// varlist ‘=’ explist |
func parseAssignStat(lexer *Lexer, var0 Exp) *AssignStat {
	varList := _finishVarList(lexer, var0) // varlist
	lexer.NextTokenOfKind(TOKEN_OP_ASSIGN) // =
	expList := parseExpList(lexer)         // explist
	lastLine := lexer.Line()
	return &AssignStat{lastLine, varList, expList}
}

// varlist ::= var {‘,’ var}
func _finishVarList(lexer *Lexer, var0 Exp) []Exp {
	vars := []Exp{_checkVar(lexer, var0)}      // var
	for lexer.LookAhead() == TOKEN_SEP_COMMA { // {
		lexer.NextToken()                          // ,
		exp := parsePrefixExp(lexer)               // var
		vars = append(vars, _checkVar(lexer, exp)) //
	} // }
	return vars
}

// var ::=  Name | prefixexp ‘[’ exp ‘]’ | prefixexp ‘.’ Name
func _checkVar(lexer *Lexer, exp Exp) Exp {
	switch exp.(type) {
	case *NameExp, *TableAccessExp:
		return exp
	}
	lexer.NextTokenOfKind(-1) // trigger error
	panic("unreachable!")
}

16.3.6 非局部函数定义语句

// function funcname funcbody
// funcname ::= Name {‘.’ Name} [‘:’ Name]
// funcbody ::= ‘(’ [parlist] ‘)’ block end
// parlist ::= namelist [‘,’ ‘...’] | ‘...’
// namelist ::= Name {‘,’ Name}
func parseFuncDefStat(lexer *Lexer) *AssignStat {
	lexer.NextTokenOfKind(TOKEN_KW_FUNCTION) // function
	fnExp, hasColon := _parseFuncName(lexer) // funcname
	fdExp := parseFuncDefExp(lexer)          // funcbody
	if hasColon {                            // insert self
		fdExp.ParList = append(fdExp.ParList, "")
		copy(fdExp.ParList[1:], fdExp.ParList)
		fdExp.ParList[0] = "self"
	}

	return &AssignStat{
		LastLine: fdExp.Line,
		VarList:  []Exp{fnExp},
		ExpList:  []Exp{fdExp},
	}
}

// funcname ::= Name {‘.’ Name} [‘:’ Name]
func _parseFuncName(lexer *Lexer) (exp Exp, hasColon bool) {
	line, name := lexer.NextIdentifier()
	exp = &NameExp{line, name}

	for lexer.LookAhead() == TOKEN_SEP_DOT {
		lexer.NextToken()
		line, name := lexer.NextIdentifier()
		idx := &StringExp{line, name}
		exp = &TableAccessExp{line, exp, idx}
	}
	if lexer.LookAhead() == TOKEN_SEP_COLON {
		lexer.NextToken()
		line, name := lexer.NextIdentifier()
		idx := &StringExp{line, name}
		exp = &TableAccessExp{line, exp, idx}
		hasColon = true
	}

	return
}

16.4 解析表达式

将表达式解析分解:

/*
exp ::=  nil | false | true | Numeral | LiteralString | ‘...’ | functiondef |
	 prefixexp | tableconstructor | exp binop exp | unop exp
*/
/*
exp   ::= exp12
exp12 ::= exp11 {or exp11}
exp11 ::= exp10 {and exp10}
exp10 ::= exp9 {(‘<’ | ‘>’ | ‘<=’ | ‘>=’ | ‘~=’ | ‘==’) exp9}
exp9  ::= exp8 {‘|’ exp8}
exp8  ::= exp7 {‘~’ exp7}
exp7  ::= exp6 {‘&’ exp6}
exp6  ::= exp5 {(‘<<’ | ‘>>’) exp5}
exp5  ::= exp4 {‘..’ exp4}
exp4  ::= exp3 {(‘+’ | ‘-’) exp3}
exp3  ::= exp2 {(‘*’ | ‘/’ | ‘//’ | ‘%’) exp2}
exp2  ::= {(‘not’ | ‘#’ | ‘-’ | ‘~’)} exp1
exp1  ::= exp0 {‘^’ exp2}
exp0  ::= nil | false | true | Numeral | LiteralString
		| ‘...’ | functiondef | prefixexp | tableconstructor
*/
func parseExp(lexer *Lexer) Exp {
	return parseExp12(lexer)
}

16.4.1 运算符表达式

// x or y
func parseExp12(lexer *Lexer) Exp {
	exp := parseExp11(lexer)
	for lexer.LookAhead() == TOKEN_OP_OR {
		line, op, _ := lexer.NextToken()
		lor := &BinopExp{line, op, exp, parseExp11(lexer)}
		exp = optimizeLogicalOr(lor)
	}
	return exp
}

这里要根据运算符的结合性书写具体逻辑. 尤其是 ...^ 具有向右结合性. 下面是 ^ 的具体例子:

// x ^ y
func parseExp1(lexer *Lexer) Exp { // pow is right associative
	exp := parseExp0(lexer)
	if lexer.LookAhead() == TOKEN_OP_POW {
		line, op, _ := lexer.NextToken()
		exp = &BinopExp{line, op, exp, parseExp2(lexer)}
	}
	return optimizePow(exp)
}

func optimizePow(exp Exp) Exp {
	if binop, ok := exp.(*BinopExp); ok {
		if binop.Op == TOKEN_OP_POW {
			binop.Exp2 = optimizePow(binop.Exp2)
		}
		return optimizeArithBinaryOp(binop)
	}
	return exp
}

^ 是可以连用的, 比如 a^b^c.

下面是一元运算符的表达式解析:

// exp2  ::= {(‘not’ | ‘#’ | ‘-’ | ‘~’)} exp1
// unary
func parseExp2(lexer *Lexer) Exp {
	switch lexer.LookAhead() {
	case TOKEN_OP_UNM, TOKEN_OP_BNOT, TOKEN_OP_LEN, TOKEN_OP_NOT:
		line, op, _ := lexer.NextToken()
		exp := &UnopExp{line, op, parseExp2(lexer)}
		return optimizeUnaryOp(exp)
	}
	return parseExp1(lexer)
}

拼接运算符则是:

// a .. b
func parseExp5(lexer *Lexer) Exp {
	exp := parseExp4(lexer)
	if lexer.LookAhead() != TOKEN_OP_CONCAT {
		return exp
	}

	line := 0
	exps := []Exp{exp}
	for lexer.LookAhead() == TOKEN_OP_CONCAT {
		line, _, _ = lexer.NextToken()
		exps = append(exps, parseExp4(lexer))
	}
	return &ConcatExp{line, exps}
}

拼接运算符特殊处理, 采用多叉树方便统一运算。

16.4.2 非运算表达式

func parseExp0(lexer *Lexer) Exp {
	switch lexer.LookAhead() {
	case TOKEN_VARARG: // ...
		line, _, _ := lexer.NextToken()
		return &VarargExp{line}
	case TOKEN_KW_NIL: // nil
		line, _, _ := lexer.NextToken()
		return &NilExp{line}
	case TOKEN_KW_TRUE: // true
		line, _, _ := lexer.NextToken()
		return &TrueExp{line}
	case TOKEN_KW_FALSE: // false
		line, _, _ := lexer.NextToken()
		return &FalseExp{line}
	case TOKEN_STRING: // LiteralString
		line, _, token := lexer.NextToken()
		return &StringExp{line, token}
	case TOKEN_NUMBER: // Numeral
		return parseNumberExp(lexer)
	case TOKEN_SEP_LCURLY: // tableconstructor
		return parseTableConstructorExp(lexer)
	case TOKEN_KW_FUNCTION: // functiondef
		lexer.NextToken()
		return parseFuncDefExp(lexer)
	default: // prefixexp
		return parsePrefixExp(lexer)
	}
}

单个表达式都很简单, 直接判断类型然后返回类型即可. 其中数字字面量会复杂一些.

func parseNumberExp(lexer *Lexer) Exp {
	line, _, token := lexer.NextToken()
	if i, ok := number.ParseInteger(token); ok {
		return &IntegerExp{line, i}
	} else if f, ok := number.ParseFloat(token); ok {
		return &FloatExp{line, f}
	} else { // todo
		panic("not a number: " + token)
	}
}

16.4.3 函数定义表达式

// functiondef ::= function funcbody
// funcbody ::= ‘(’ [parlist] ‘)’ block end
func parseFuncDefExp(lexer *Lexer) *FuncDefExp {
	line := lexer.Line()                               // function
	lexer.NextTokenOfKind(TOKEN_SEP_LPAREN)            // (
	parList, isVararg := _parseParList(lexer)          // [parlist]
	lexer.NextTokenOfKind(TOKEN_SEP_RPAREN)            // )
	block := parseBlock(lexer)                         // block
	lastLine, _ := lexer.NextTokenOfKind(TOKEN_KW_END) // end
	return &FuncDefExp{line, lastLine, parList, isVararg, block}
}

_parseParList() 用于参数列表解析.

// [parlist]
// parlist ::= namelist [‘,’ ‘...’] | ‘...’
func _parseParList(lexer *Lexer) (names []string, isVararg bool) {
	switch lexer.LookAhead() {
	case TOKEN_SEP_RPAREN:
		return nil, false
	case TOKEN_VARARG:
		lexer.NextToken()
		return nil, true
	}

	_, name := lexer.NextIdentifier()
	names = append(names, name)
	for lexer.LookAhead() == TOKEN_SEP_COMMA {
		lexer.NextToken()
		if lexer.LookAhead() == TOKEN_IDENTIFIER {
			_, name := lexer.NextIdentifier()
			names = append(names, name)
		} else {
			lexer.NextTokenOfKind(TOKEN_VARARG)
			isVararg = true
			break
		}
	}
	return
}

16.4.4 表构造表达式

// tableconstructor ::= ‘{’ [fieldlist] ‘}’
func parseTableConstructorExp(lexer *Lexer) *TableConstructorExp {
	line := lexer.Line()
	lexer.NextTokenOfKind(TOKEN_SEP_LCURLY)    // {
	keyExps, valExps := _parseFieldList(lexer) // [fieldlist]
	lexer.NextTokenOfKind(TOKEN_SEP_RCURLY)    // }
	lastLine := lexer.Line()
	return &TableConstructorExp{line, lastLine, keyExps, valExps}
}

// fieldlist ::= field {fieldsep field} [fieldsep]
func _parseFieldList(lexer *Lexer) (ks, vs []Exp) {
	if lexer.LookAhead() != TOKEN_SEP_RCURLY {
		k, v := _parseField(lexer)
		ks = append(ks, k)
		vs = append(vs, v)

		for _isFieldSep(lexer.LookAhead()) {
			lexer.NextToken()
			if lexer.LookAhead() != TOKEN_SEP_RCURLY {
				k, v := _parseField(lexer)
				ks = append(ks, k)
				vs = append(vs, v)
			} else {
				break
			}
		}
	}
	return
}

// fieldsep ::= ‘,’ | ‘;’
func _isFieldSep(tokenKind int) bool {
	return tokenKind == TOKEN_SEP_COMMA || tokenKind == TOKEN_SEP_SEMI
}

// field ::= ‘[’ exp ‘]’ ‘=’ exp | Name ‘=’ exp | exp
func _parseField(lexer *Lexer) (k, v Exp) {
	if lexer.LookAhead() == TOKEN_SEP_LBRACK {
		lexer.NextToken()                       // [
		k = parseExp(lexer)                     // exp
		lexer.NextTokenOfKind(TOKEN_SEP_RBRACK) // ]
		lexer.NextTokenOfKind(TOKEN_OP_ASSIGN)  // =
		v = parseExp(lexer)                     // exp
		return
	}

	exp := parseExp(lexer)
	if nameExp, ok := exp.(*NameExp); ok {
		if lexer.LookAhead() == TOKEN_OP_ASSIGN {
			// Name ‘=’ exp => ‘[’ LiteralString ‘]’ = exp
			lexer.NextToken()
			k = &StringExp{nameExp.Line, nameExp.Name}
			v = parseExp(lexer)
			return
		}
	}

	return nil, exp
}

16.4.5 前缀表达式

// prefixexp ::= var | functioncall | ‘(’ exp ‘)’
// var ::=  Name | prefixexp ‘[’ exp ‘]’ | prefixexp ‘.’ Name
// functioncall ::=  prefixexp args | prefixexp ‘:’ Name args

/*
prefixexp ::= Name
	| ‘(’ exp ‘)’
	| prefixexp ‘[’ exp ‘]’
	| prefixexp ‘.’ Name
	| prefixexp [‘:’ Name] args
*/
func parsePrefixExp(lexer *Lexer) Exp {
	var exp Exp
	if lexer.LookAhead() == TOKEN_IDENTIFIER {
		line, name := lexer.NextIdentifier() // Name
		exp = &NameExp{line, name}
	} else { // ‘(’ exp ‘)’
		exp = parseParensExp(lexer)
	}
	return _finishPrefixExp(lexer, exp)
}

func _finishPrefixExp(lexer *Lexer, exp Exp) Exp {
	for {
		switch lexer.LookAhead() {
		case TOKEN_SEP_LBRACK: // prefixexp ‘[’ exp ‘]’
			lexer.NextToken()                       // ‘[’
			keyExp := parseExp(lexer)               // exp
			lexer.NextTokenOfKind(TOKEN_SEP_RBRACK) // ‘]’
			exp = &TableAccessExp{lexer.Line(), exp, keyExp}
		case TOKEN_SEP_DOT: // prefixexp ‘.’ Name
			lexer.NextToken()                    // ‘.’
			line, name := lexer.NextIdentifier() // Name
			keyExp := &StringExp{line, name}
			exp = &TableAccessExp{line, exp, keyExp}
		case TOKEN_SEP_COLON, // prefixexp ‘:’ Name args
			TOKEN_SEP_LPAREN, TOKEN_SEP_LCURLY, TOKEN_STRING: // prefixexp args
			exp = _finishFuncCallExp(lexer, exp)
		default:
			return exp
		}
	}
	return exp
}

16.4.6 圆括号表达式

func parseParensExp(lexer *Lexer) Exp {
	lexer.NextTokenOfKind(TOKEN_SEP_LPAREN) // (
	exp := parseExp(lexer)                  // exp
	lexer.NextTokenOfKind(TOKEN_SEP_RPAREN) // )

	switch exp.(type) {
	case *VarargExp, *FuncCallExp, *NameExp, *TableAccessExp:
		return &ParensExp{exp}
	}

	// no need to keep parens
	return exp
}

运算优先级调节应该是放到了代码生成阶段.

16.4.7 函数调用表达式

// functioncall ::=  prefixexp args | prefixexp ‘:’ Name args
func _finishFuncCallExp(lexer *Lexer, prefixExp Exp) *FuncCallExp {
	nameExp := _parseNameExp(lexer)
	line := lexer.Line() // todo
	args := _parseArgs(lexer)
	lastLine := lexer.Line()
	return &FuncCallExp{line, lastLine, prefixExp, nameExp, args}
}

func _parseNameExp(lexer *Lexer) *StringExp {
	if lexer.LookAhead() == TOKEN_SEP_COLON {
		lexer.NextToken()
		line, name := lexer.NextIdentifier()
		return &StringExp{line, name}
	}
	return nil
}

// args ::=  ‘(’ [explist] ‘)’ | tableconstructor | LiteralString
func _parseArgs(lexer *Lexer) (args []Exp) {
	switch lexer.LookAhead() {
	case TOKEN_SEP_LPAREN: // ‘(’ [explist] ‘)’
		lexer.NextToken() // TOKEN_SEP_LPAREN
		if lexer.LookAhead() != TOKEN_SEP_RPAREN {
			args = parseExpList(lexer)
		}
		lexer.NextTokenOfKind(TOKEN_SEP_RPAREN)
	case TOKEN_SEP_LCURLY: // ‘{’ [fieldlist] ‘}’
		args = []Exp{parseTableConstructorExp(lexer)}
	default: // LiteralString
		line, str := lexer.NextTokenOfKind(TOKEN_STRING)
		args = []Exp{&StringExp{line, str}}
	}
	return
}

16.4.8 表达式优化

具体而言, 优化包括字面量参与的算术, 按位, 逻辑运算符表达式进行优化.

即运算符表达式如果能在编译期进行求值, 会直接进行求值.

一元运算符优化的例子:

// unary
func parseExp2(lexer *Lexer) Exp {
	switch lexer.LookAhead() {
	case TOKEN_OP_UNM, TOKEN_OP_BNOT, TOKEN_OP_LEN, TOKEN_OP_NOT:
		line, op, _ := lexer.NextToken()
		exp := &UnopExp{line, op, parseExp2(lexer)}
		return optimizeUnaryOp(exp)
	}
	return parseExp1(lexer)
}

func optimizeUnaryOp(exp *UnopExp) Exp {
	switch exp.Op {
	case TOKEN_OP_UNM:
		return optimizeUnm(exp)
	case TOKEN_OP_NOT:
		return optimizeNot(exp)
	case TOKEN_OP_BNOT:
		return optimizeBnot(exp)
	default:
		return exp
	}
}

具体到取负值的优化是:

func optimizeUnm(exp *UnopExp) Exp {
	switch x := exp.Exp.(type) { // number?
	case *IntegerExp:
		x.Val = -x.Val
		return x
	case *FloatExp:
		if x.Val != 0 {
			x.Val = -x.Val
			return x
		}
	}
	return exp
}

最后是整体的 Parser 入口.

func Parse(chunk, chunkName string) *Block {
	lexer := NewLexer(chunk, chunkName)
	block := parseBlock(lexer)
	lexer.NextTokenOfKind(TOKEN_EOF)
	return block
}

16.5 测试本章代码

16.6 本章小结

Chapter 17, 代码生成

17.1 定义 funcInfo 结构体

代码生成会分为两个阶段, 第一阶段对AST进行处理, 生成自定义内部结构, 第二个阶段把内部结构转换为函数原型.

17.1.1 常量表

type funcInfo struct {
	codeBuf
	parent    *funcInfo
	subFuncs  []*funcInfo
	usedRegs  int
	maxRegs   int
	scopeLv   int
	locVars   []*locVarInfo
	locNames  map[string]*locVarInfo
	upvalues  map[string]upvalInfo
	constants map[interface{}]int
	labels    map[string]labelInfo
	gotos     []*gotoInfo
	breaks    [][]int
	line      int
	lastLine  int
	numParams int
	isVararg  bool
}

constants 字段负责存储函数原型的常量表. 接下来是 indexOfConstant().

func (fi *funcInfo) indexOfConstant(k interface{}) int {
	if idx, found := fi.constants[k]; found {
		return idx
	}

	idx := len(fi.constants)
	fi.constants[k] = idx
	return idx
}

如果常量不在列表则会塞进去然后返回索引.

17.1.2 寄存器分配

type funcInfo struct {
	codeBuf

	usedRegs  int
	maxRegs   int
	
}

仅仅记录已分配的寄存器数量和最大寄存器数量即可. allocReg() 负责分配并更新最大寄存器数量, 并返回寄存器索引.

func (fi *funcInfo) allocReg() int {
	fi.usedRegs++
	if fi.usedRegs >= 255 {
		panic("function or expression needs too many registers")
	}
	if fi.usedRegs > fi.maxRegs {
		fi.maxRegs = fi.usedRegs
	}
	return fi.usedRegs - 1
}

freeReg() 用于回收.

func (fi *funcInfo) freeReg() {
	if fi.usedRegs <= 0 {
		panic("usedRegs <= 0 !")
	}
	fi.usedRegs--
}

allocRegs() 用于连续分配 n 个寄存器并返回第一个索引, freeRegs 则负责回收.

func (fi *funcInfo) allocRegs(n int) int {
	if n <= 0 {
		panic("n <= 0 !")
	}
	for i := 0; i < n; i++ {
		fi.allocReg()
	}
	return fi.usedRegs - n
}

func (fi *funcInfo) freeRegs(n int) {
	if n < 0 {
		panic("n < 0 !")
	}
	for i := 0; i < n; i++ {
		fi.freeReg()
	}
}

17.1.3 局部变量表

这里用单链表来串联同名的局部变量. 即 locVarInfo.

type locVarInfo struct {
	prev     *locVarInfo
	name     string
	scopeLv  int
	slot     int
	startPC  int
	endPC    int
	captured bool
}

name 为局部变量名, scopeLv 记录局部变量所在的作用于层次, slot 字段记录与局部变量名绑定的寄存器索引, captured 表示局部变量是否被闭包捕获.

type funcInfo struct {
	codeBuf
	parent    *funcInfo
	subFuncs  []*funcInfo
	usedRegs  int
    maxRegs   int
    
	scopeLv   int
	locVars   []*locVarInfo
    locNames  map[string]*locVarInfo
    
	upvalues  map[string]upvalInfo
	constants map[interface{}]int
	labels    map[string]labelInfo
	gotos     []*gotoInfo
	breaks    [][]int
	line      int
	lastLine  int
	numParams int
	isVararg  bool
}

在 funcInfo 中 scopeLv 记录当前作用域层次, locVars 按顺序记录函数内部声明的全部局部变量. locNames 记录当前生效的局部变量.

enterScope() 用于进入新的作用域.

func (fi *funcInfo) enterScope(breakable bool) {
	fi.scopeLv++
	if breakable {
		fi.breaks = append(fi.breaks, []int{})
	} else {
		fi.breaks = append(fi.breaks, nil)
    }
}

addLocVar() 方法在当前作用域添加一个局部变量, 返回其分配的寄存器索引.

func (fi *funcInfo) addLocVar(name string, startPC int) int {
	newVar := &locVarInfo{
		name:    name,
		prev:    fi.locNames[name],
		scopeLv: fi.scopeLv,
		slot:    fi.allocReg(), // alloc new register
		startPC: startPC,
		endPC:   0,
	}

	fi.locVars = append(fi.locVars, newVar)
	fi.locNames[name] = newVar

	return newVar.slot
}

slotOfLocVar() 方法检查局部变量名是否已经和某个寄存器绑定, 如果是则返回寄存器索引, 否则返回 -1.

func (fi *funcInfo) slotOfLocVar(name string) int {
	if locVar, found := fi.locNames[name]; found {
		return locVar.slot
	}
	return -1
}

exitScope() 用于退出作用于.

func (fi *funcInfo) exitScope(endPC int) {
	pendingBreakJmps := fi.breaks[len(fi.breaks)-1]
	fi.breaks = fi.breaks[:len(fi.breaks)-1]

	a := fi.getJmpArgA()
	for _, pc := range pendingBreakJmps {
		sBx := fi.pc() - pc
		i := (sBx+MAXARG_sBx)<<14 | a<<6 | OP_JMP
		fi.insts[pc] = uint32(i)
	}

	fi.fixGotoJmps()

	fi.scopeLv--
	for _, locVar := range fi.locNames {
		if locVar.scopeLv > fi.scopeLv { // out of scope
			locVar.endPC = endPC
			fi.removeLocVar(locVar)
		}
	}
}

判断离开了当前作用域后, 会移除 locVar. 使用 removeLocVar()

func (fi *funcInfo) removeLocVar(locVar *locVarInfo) {
	fi.freeReg()
	if locVar.prev == nil {
		delete(fi.locNames, locVar.name)
	} else if locVar.prev.scopeLv == locVar.scopeLv {
		fi.removeLocVar(locVar.prev)
	} else {
		fi.locNames[locVar.name] = locVar.prev
	}
}

涉及到移除, 同名重新绑定的过程.

17.1.4 Break 表

Break 可以打断循环, 处理难点有:

所以为了解决, 需要把跳转指令的地址记录在对应的 for, repeat, while 块里, 结束后再修复跳转的目标地址.

type funcInfo struct {

	breaks    [][]int

}

这里使用数组来记录循环块内待处理的跳转指令, 数字长度和块的深度对应. 通过判断是否为 nil 就可以判断对应的块是否是循环块. 这里需要修改 enterScope()

func (fi *funcInfo) enterScope(breakable bool) {
	fi.scopeLv++
	if breakable {
		fi.breaks = append(fi.breaks, []int{})
	} else {
		fi.breaks = append(fi.breaks, nil)
    }
}

addBreakJmp() 方法把 break 语句对应的跳转指令添加到最近的循环块里, 找不到则 painc().

func (fi *funcInfo) addBreakJmp(pc int) {
	for i := fi.scopeLv; i >= 0; i-- {
		if fi.breaks[i] != nil { // breakable
			fi.breaks[i] = append(fi.breaks[i], pc)
			return
		}
	}

	panic("<break> at line ? not inside a loop!")
}

exitScope() 也要增加相关代码:

func (fi *funcInfo) exitScope(endPC int) {
	pendingBreakJmps := fi.breaks[len(fi.breaks)-1]
	fi.breaks = fi.breaks[:len(fi.breaks)-1]

	a := fi.getJmpArgA()
	for _, pc := range pendingBreakJmps {
		sBx := fi.pc() - pc
		i := (sBx+MAXARG_sBx)<<14 | a<<6 | OP_JMP
		fi.insts[pc] = uint32(i)
	}

	fi.fixGotoJmps()

	fi.scopeLv--
	for _, locVar := range fi.locNames {
		if locVar.scopeLv > fi.scopeLv { // out of scope
			locVar.endPC = endPC
			fi.removeLocVar(locVar)
		}
	}
}

简单来讲, 在 enterScope 过程将 fi.breaks 增加, 然后 addBreakJmp 负责将跳转地址 (pc) 怼到当前 fi.breaks 的位置里面. 当 exitScope 时, 则进行递减流程.

17.1.5 Upvalue 表

即在当前作用域捕获外围函数中的局部变量. Upvalue 名只能绑定唯一的 Upvalue. 所以结构相对简单.

type upvalInfo struct {
	locVarSlot int
	upvalIndex int
	index      int
}

locVarSlot 记录该局部变量占用的寄存器索引, 如果为空则已经被外围函数捕获. upvalIndex 记录该 Upvalue 在直接外围函数 Upvalue 表中的索引. index 记录 Upvalue 在函数中出现的顺序.

type funcInfo struct {
	codeBuf
	parent    *funcInfo

    
	upvalues  map[string]upvalInfo

}

upvalues 即 Upvalue 表. parent 用于定位外围函数的局部变量表和 Upvalue 表.

func (fi *funcInfo) indexOfUpval(name string) int {
	if upval, ok := fi.upvalues[name]; ok {
		return upval.index
	}
	if fi.parent != nil {
		if locVar, found := fi.parent.locNames[name]; found {
			idx := len(fi.upvalues)
			fi.upvalues[name] = upvalInfo{locVar.slot, -1, idx}
			locVar.captured = true
			return idx
		}
		if uvIdx := fi.parent.indexOfUpval(name); uvIdx >= 0 {
			idx := len(fi.upvalues)
			fi.upvalues[name] = upvalInfo{-1, uvIdx, idx}
			return idx
		}
	}
	return -1
}

indexOfUpval 用于实施绑定并返回索引.

17.1.6 字节码

type codeBuf struct {
	insts    []uint32
	lineNums []uint32
}

insts 存储字节码.

func (cb *codeBuf) emitABC(line, opcode, a, b, c int) {
	i := b<<23 | c<<14 | a<<6 | opcode
	cb.insts = append(cb.insts, uint32(i))
	cb.lineNums = append(cb.lineNums, uint32(line))
}

func (cb *codeBuf) emitABx(line, opcode, a, bx int) {
	i := bx<<14 | a<<6 | opcode
	cb.insts = append(cb.insts, uint32(i))
	cb.lineNums = append(cb.lineNums, uint32(line))
}

func (cb *codeBuf) emitAsBx(line, opcode, a, sBx int) {
	i := (sBx+MAXARG_sBx)<<14 | a<<6 | opcode
	cb.insts = append(cb.insts, uint32(i))
	cb.lineNums = append(cb.lineNums, uint32(line))
}

func (cb *codeBuf) emitAx(line, opcode, ax int) {
	i := ax<<6 | opcode
	cb.insts = append(cb.insts, uint32(i))
	cb.lineNums = append(cb.lineNums, uint32(line))
}

这段代码用于将 line, opcode, abc 寄存器拼接成指令并放入 insts.

func (cb *codeBuf) fixSbx(pc, sBx int) {
	if sBx > 0 && sBx > MAXARG_sBx || sBx < 0 && -sBx > MAXARG_sBx {
		panic("control structure too long")
	}

	i := cb.insts[pc]
	i = i << 18 >> 18                  // clear sBx
	i = i | uint32(sBx+MAXARG_sBx)<<14 // reset sBx
	cb.insts[pc] = i
}

17.1.7 其他信息

定义 funcInfo 的生成函数.

func newFuncInfo(parent *funcInfo, fd *FuncDefExp) *funcInfo {
	return &funcInfo{
		codeBuf: codeBuf{
			insts:    make([]uint32, 0, 8),
			lineNums: make([]uint32, 0, 8),
		},
		parent:    parent,
		subFuncs:  []*funcInfo{},
		locVars:   make([]*locVarInfo, 0, 8),
		locNames:  map[string]*locVarInfo{},
		upvalues:  map[string]upvalInfo{},
		constants: map[interface{}]int{},
		labels:    map[string]labelInfo{},
		gotos:     nil,
		breaks:    make([][]int, 1),
		line:      fd.Line,
		lastLine:  fd.LastLine,
		numParams: len(fd.ParList),
		isVararg:  fd.IsVararg,
	}
}

17.2 编译块

func cgBlock(fi *funcInfo, node *Block) {
	for _, stat := range node.Stats {
		cgStat(fi, stat)
	}

	if node.RetExps != nil {
		cgRetStat(fi, node.RetExps, node.LastLine)
	}
}

func cgRetStat(fi *funcInfo, exps []Exp, lastLine int) {
	nExps := len(exps)
	if nExps == 0 {
		fi.emitReturn(lastLine, 0, 0)
		return
	}

	if nExps == 1 {
		if nameExp, ok := exps[0].(*NameExp); ok {
			if r := fi.slotOfLocVar(nameExp.Name); r >= 0 {
				fi.emitReturn(lastLine, r, 1)
				return
			}
		}
		if fcExp, ok := exps[0].(*FuncCallExp); ok {
			r := fi.allocReg()
			cgTailCallExp(fi, fcExp, r)
			fi.freeReg()
			fi.emitReturn(lastLine, r, -1)
			return
		}
	}

	multRet := isVarargOrFuncCall(exps[nExps-1])
	for i, exp := range exps {
		r := fi.allocReg()
		if i == nExps-1 && multRet {
			cgExp(fi, exp, r, -1)
		} else {
			cgExp(fi, exp, r, 1)
		}
	}
	fi.freeRegs(nExps)

	a := fi.usedRegs // correct?
	if multRet {
		fi.emitReturn(lastLine, a, -1)
	} else {
		fi.emitReturn(lastLine, a, nExps)
	}
}

17.3 编译语句

接下来时语句的编译过程.

func cgStat(fi *funcInfo, node Stat) {
	switch stat := node.(type) {
	case *FuncCallStat:
		cgFuncCallStat(fi, stat)
	case *BreakStat:
		cgBreakStat(fi, stat)
	case *DoStat:
		cgDoStat(fi, stat)
	case *WhileStat:
		cgWhileStat(fi, stat)
	case *RepeatStat:
		cgRepeatStat(fi, stat)
	case *IfStat:
		cgIfStat(fi, stat)
	case *ForNumStat:
		cgForNumStat(fi, stat)
	case *ForInStat:
		cgForInStat(fi, stat)
	case *AssignStat:
		cgAssignStat(fi, stat)
	case *LocalVarDeclStat:
		cgLocalVarDeclStat(fi, stat)
	case *LocalFuncDefStat:
		cgLocalFuncDefStat(fi, stat)
	case *LabelStat:
		cgLabelStat(fi, stat)
	case *GotoStat:
		cgGotoStat(fi, stat)
	}
}

17.3.1 简单语句

local function f() end 等价于 local f; f=function() end

func cgLocalFuncDefStat(fi *funcInfo, node *LocalFuncDefStat) {
	r := fi.addLocVar(node.Name, fi.pc()+2)
	cgFuncDefExp(fi, node.Exp, r)
}
func cgFuncCallStat(fi *funcInfo, node *FuncCallStat) {
	r := fi.allocReg()
	cgFuncCallExp(fi, node, r, 0)
	fi.freeReg()
}

func cgBreakStat(fi *funcInfo, node *BreakStat) {
	pc := fi.emitJmp(node.Line, 0, 0)
	fi.addBreakJmp(pc)
}

func cgDoStat(fi *funcInfo, node *DoStat) {
	fi.enterScope(false)
	cgBlock(fi, node.Block)
	fi.closeOpenUpvals(node.Block.LastLine)
	fi.exitScope(fi.pc() + 1)
}

生成过程大部分都是 addLocVar 声明局部变量, allocReg 申请寄存器, enterScope, 进入新的作用域等.

closeOpenUpvals() 负责 Upvalue 闭合, getJmpArgA 负责获取跳转地址 A.

func (fi *funcInfo) closeOpenUpvals(line int) {
	a := fi.getJmpArgA()
	if a > 0 {
		fi.emitJmp(line, a, 0)
	}
}

func (fi *funcInfo) getJmpArgA() int {
	hasCapturedLocVars := false
	minSlotOfLocVars := fi.maxRegs
	for _, locVar := range fi.locNames {
		if locVar.scopeLv == fi.scopeLv {
			for v := locVar; v != nil && v.scopeLv == fi.scopeLv; v = v.prev {
				if v.captured {
					hasCapturedLocVars = true
				}
				if v.slot < minSlotOfLocVars && v.name[0] != '(' {
					minSlotOfLocVars = v.slot
				}
			}
		}
	}
	if hasCapturedLocVars {
		return minSlotOfLocVars + 1
	} else {
		return 0
	}
}

17.3.2 while 和 repeat 语句.

/*
           ______________
          /  false? jmp  |
         /               |
while exp do block end <-'
      ^           \
      |___________/
           jmp
*/
func cgWhileStat(fi *funcInfo, node *WhileStat) {
	pcBeforeExp := fi.pc()

	oldRegs := fi.usedRegs
	a, _ := expToOpArg(fi, node.Exp, ArgReg)
	fi.usedRegs = oldRegs

	line := lastLineOf(node.Exp)
	fi.emitTest(line, a, 0)
	pcJmpToEnd := fi.emitJmp(line, 0, 0)

	fi.enterScope(true)
	cgBlock(fi, node.Block)
	fi.closeOpenUpvals(node.Block.LastLine)
	fi.emitJmp(node.Block.LastLine, 0, pcBeforeExp-fi.pc()-1)
	fi.exitScope(fi.pc())

	fi.fixSbx(pcJmpToEnd, fi.pc()-pcJmpToEnd)
}

/*
        ______________
       |  false? jmp  |
       V              /
repeat block until exp
*/
func cgRepeatStat(fi *funcInfo, node *RepeatStat) {
	fi.enterScope(true)

	pcBeforeBlock := fi.pc()
	cgBlock(fi, node.Block)

	oldRegs := fi.usedRegs
	a, _ := expToOpArg(fi, node.Exp, ArgReg)
	fi.usedRegs = oldRegs

	line := lastLineOf(node.Exp)
	fi.emitTest(line, a, 0)
	fi.emitJmp(line, fi.getJmpArgA(), pcBeforeBlock-fi.pc()-1)
	fi.closeOpenUpvals(line)

	fi.exitScope(fi.pc() + 1)
}


这两个语句的核心即获取当前地址, 然后生成跳转地址实现循环. 然后对条件表达式进行求值, 决定是否跳出.

17.3.3 if 语句

/*
         _________________       _________________       _____________
        / false? jmp      |     / false? jmp      |     / false? jmp  |
       /                  V    /                  V    /              V
if exp1 then block1 elseif exp2 then block2 elseif true then block3 end <-.
                   \                       \                       \      |
                    \_______________________\_______________________\_____|
                    jmp                     jmp                     jmp
*/
func cgIfStat(fi *funcInfo, node *IfStat) {
	pcJmpToEnds := make([]int, len(node.Exps))
	pcJmpToNextExp := -1

	for i, exp := range node.Exps {
		if pcJmpToNextExp >= 0 {
			fi.fixSbx(pcJmpToNextExp, fi.pc()-pcJmpToNextExp)
		}

		oldRegs := fi.usedRegs
		a, _ := expToOpArg(fi, exp, ArgReg)
		fi.usedRegs = oldRegs

		line := lastLineOf(exp)
		fi.emitTest(line, a, 0)
		pcJmpToNextExp = fi.emitJmp(line, 0, 0)

		block := node.Blocks[i]
		fi.enterScope(false)
		cgBlock(fi, block)
		fi.closeOpenUpvals(block.LastLine)
		fi.exitScope(fi.pc() + 1)
		if i < len(node.Exps)-1 {
			pcJmpToEnds[i] = fi.emitJmp(block.LastLine, 0, 0)
		} else {
			pcJmpToEnds[i] = pcJmpToNextExp
		}
	}

	for _, pc := range pcJmpToEnds {
		fi.fixSbx(pc, fi.pc()-pc)
	}
}

求值, 执行代码块, 或求值, 跳转的过程.

17.3.4 for 循环语句

func cgForNumStat(fi *funcInfo, node *ForNumStat) {
	forIndexVar := "(for index)"
	forLimitVar := "(for limit)"
	forStepVar := "(for step)"

	fi.enterScope(true)

	cgLocalVarDeclStat(fi, &LocalVarDeclStat{
		NameList: []string{forIndexVar, forLimitVar, forStepVar},
		ExpList:  []Exp{node.InitExp, node.LimitExp, node.StepExp},
	})
	fi.addLocVar(node.VarName, fi.pc()+2)

	a := fi.usedRegs - 4
	pcForPrep := fi.emitForPrep(node.LineOfDo, a, 0)
	cgBlock(fi, node.Block)
	fi.closeOpenUpvals(node.Block.LastLine)
	pcForLoop := fi.emitForLoop(node.LineOfFor, a, 0)

	fi.fixSbx(pcForPrep, pcForLoop-pcForPrep-1)
	fi.fixSbx(pcForLoop, pcForPrep-pcForLoop)

	fi.exitScope(fi.pc())
	fi.fixEndPC(forIndexVar, 1)
	fi.fixEndPC(forLimitVar, 1)
	fi.fixEndPC(forStepVar, 1)
}

func cgForInStat(fi *funcInfo, node *ForInStat) {
	forGeneratorVar := "(for generator)"
	forStateVar := "(for state)"
	forControlVar := "(for control)"

	fi.enterScope(true)

	cgLocalVarDeclStat(fi, &LocalVarDeclStat{
		//LastLine: 0,
		NameList: []string{forGeneratorVar, forStateVar, forControlVar},
		ExpList:  node.ExpList,
	})
	for _, name := range node.NameList {
		fi.addLocVar(name, fi.pc()+2)
	}

	pcJmpToTFC := fi.emitJmp(node.LineOfDo, 0, 0)
	cgBlock(fi, node.Block)
	fi.closeOpenUpvals(node.Block.LastLine)
	fi.fixSbx(pcJmpToTFC, fi.pc()-pcJmpToTFC)

	line := lineOf(node.ExpList[0])
	rGenerator := fi.slotOfLocVar(forGeneratorVar)
	fi.emitTForCall(line, rGenerator, len(node.NameList))
	fi.emitTForLoop(line, rGenerator+2, pcJmpToTFC-fi.pc()-1)

	fi.exitScope(fi.pc() - 1)
	fi.fixEndPC(forGeneratorVar, 2)
	fi.fixEndPC(forStateVar, 2)
	fi.fixEndPC(forControlVar, 2)
}

for 循环需要三个特殊局部变量, 用于存储索引, 限制, 步长. 然后执行 block, 最后修复偏移量.

17.3.5 局部变量声明语句

func cgLocalVarDeclStat(fi *funcInfo, node *LocalVarDeclStat) {
	exps := removeTailNils(node.ExpList)
	nExps := len(exps)
	nNames := len(node.NameList)

	oldRegs := fi.usedRegs
	if nExps == nNames {
		for _, exp := range exps {
			a := fi.allocReg()
			cgExp(fi, exp, a, 1)
		}
	} else if nExps > nNames {
		for i, exp := range exps {
			a := fi.allocReg()
			if i == nExps-1 && isVarargOrFuncCall(exp) {
				cgExp(fi, exp, a, 0)
			} else {
				cgExp(fi, exp, a, 1)
			}
		}
	} else { // nNames > nExps
		multRet := false
		for i, exp := range exps {
			a := fi.allocReg()
			if i == nExps-1 && isVarargOrFuncCall(exp) {
				multRet = true
				n := nNames - nExps + 1
				cgExp(fi, exp, a, n)
				fi.allocRegs(n - 1)
			} else {
				cgExp(fi, exp, a, 1)
			}
		}
		if !multRet {
			n := nNames - nExps
			a := fi.allocRegs(n)
			fi.emitLoadNil(node.LastLine, a, n)
		}
	}

	fi.usedRegs = oldRegs
	startPC := fi.pc() + 1
	for _, name := range node.NameList {
		fi.addLocVar(name, startPC)
	}
}

第一部分是等号左右相等, 则分配临时变量, 进行表达式求值, 然后局部变量与寄存器绑定.
第二部分是表达式比名称多, 多余的也进行求值即可.
第三部分是表达式比变量名称少, 则要么多次赋值, 要么装入 nil.

17.3.6 赋值语句

func cgAssignStat(fi *funcInfo, node *AssignStat) {
	exps := removeTailNils(node.ExpList)
	nExps := len(exps)
	nVars := len(node.VarList)

	tRegs := make([]int, nVars)
	kRegs := make([]int, nVars)
	vRegs := make([]int, nVars)
	oldRegs := fi.usedRegs

	for i, exp := range node.VarList {
		if taExp, ok := exp.(*TableAccessExp); ok {
			tRegs[i] = fi.allocReg()
			cgExp(fi, taExp.PrefixExp, tRegs[i], 1)
			kRegs[i] = fi.allocReg()
			cgExp(fi, taExp.KeyExp, kRegs[i], 1)
		} else {
			name := exp.(*NameExp).Name
			if fi.slotOfLocVar(name) < 0 && fi.indexOfUpval(name) < 0 {
				// global var
				kRegs[i] = -1
				if fi.indexOfConstant(name) > 0xFF {
					kRegs[i] = fi.allocReg()
				}
			}
		}
	}
	for i := 0; i < nVars; i++ {
		vRegs[i] = fi.usedRegs + i
	}

	if nExps >= nVars {
		for i, exp := range exps {
			a := fi.allocReg()
			if i >= nVars && i == nExps-1 && isVarargOrFuncCall(exp) {
				cgExp(fi, exp, a, 0)
			} else {
				cgExp(fi, exp, a, 1)
			}
		}
	} else { // nVars > nExps
		multRet := false
		for i, exp := range exps {
			a := fi.allocReg()
			if i == nExps-1 && isVarargOrFuncCall(exp) {
				multRet = true
				n := nVars - nExps + 1
				cgExp(fi, exp, a, n)
				fi.allocRegs(n - 1)
			} else {
				cgExp(fi, exp, a, 1)
			}
		}
		if !multRet {
			n := nVars - nExps
			a := fi.allocRegs(n)
			fi.emitLoadNil(node.LastLine, a, n)
		}
	}

	lastLine := node.LastLine
	for i, exp := range node.VarList {
		if nameExp, ok := exp.(*NameExp); ok {
			varName := nameExp.Name
			if a := fi.slotOfLocVar(varName); a >= 0 {
				fi.emitMove(lastLine, a, vRegs[i])
			} else if b := fi.indexOfUpval(varName); b >= 0 {
				fi.emitSetUpval(lastLine, vRegs[i], b)
			} else if a := fi.slotOfLocVar("_ENV"); a >= 0 {
				if kRegs[i] < 0 {
					b := 0x100 + fi.indexOfConstant(varName)
					fi.emitSetTable(lastLine, a, b, vRegs[i])
				} else {
					fi.emitSetTable(lastLine, a, kRegs[i], vRegs[i])
				}
			} else { // global var
				a := fi.indexOfUpval("_ENV")
				if kRegs[i] < 0 {
					b := 0x100 + fi.indexOfConstant(varName)
					fi.emitSetTabUp(lastLine, a, b, vRegs[i])
				} else {
					fi.emitSetTabUp(lastLine, a, kRegs[i], vRegs[i])
				}
			}
		} else {
			fi.emitSetTable(lastLine, tRegs[i], kRegs[i], vRegs[i])
		}
	}

	// todo
	fi.usedRegs = oldRegs
}

赋值语句复杂在可以多重赋值, 给局部变量, Upvalue, 全局变量赋值, 还可以根据索引修改表.
第一部分首先分配临时变量, 对表达式进行求值. tRegs, kRegs, vRegs 分别表示表, 键, 值分配的临时变量.

for i, exp := range node.VarList 这里开始处理索引赋值.
if nExps >= nVars 这里开始是多重赋值.
for i, exp := range node.VarList 这里做具体的赋值处理, 局部变量赋值使用 MOVE, Upvalue 赋值使用 SETUPVALUE, 全局变量生成 SETTABUP. 按索引则生成 SETTABLE, 最后还要释放临时变量.

17.4 编译表达式

Lua 表达式可分为字面量表达式, 构造器表达式, 运算符表达式, 前缀表达式, vararg 表达式.

cgExp 负责表达式编译.

// todo: rename to evalExp()?
func cgExp(fi *funcInfo, node Exp, a, n int) {
	switch exp := node.(type) {
	case *NilExp:
		fi.emitLoadNil(exp.Line, a, n)
	case *FalseExp:
		fi.emitLoadBool(exp.Line, a, 0, 0)
	case *TrueExp:
		fi.emitLoadBool(exp.Line, a, 1, 0)
	case *IntegerExp:
		fi.emitLoadK(exp.Line, a, fi.indexOfConstant(exp.Val))
	case *FloatExp:
		fi.emitLoadK(exp.Line, a, fi.indexOfConstant(exp.Val))
	case *StringExp:
		fi.emitLoadK(exp.Line, a, fi.indexOfConstant(exp.Str))
	case *ParensExp:
		cgExp(fi, exp.Exp, a, 1)
	case *VarargExp:
		cgVarargExp(fi, exp, a, n)
	case *FuncDefExp:
		cgFuncDefExp(fi, exp, a)
	case *TableConstructorExp:
		cgTableConstructorExp(fi, exp, a)
	case *UnopExp:
		cgUnopExp(fi, exp, a)
	case *BinopExp:
		cgBinopExp(fi, exp, a)
	case *ConcatExp:
		cgConcatExp(fi, exp, a)
	case *NameExp:
		cgNameExp(fi, exp, a)
	case *TableAccessExp:
		cgTableAccessExp(fi, exp, a)
	case *FuncCallExp:
		cgFuncCallExp(fi, exp, a, n)
	}
}

字面量全部在函数中处理了, 执行相关的 load 指令即可.

func cgVarargExp(fi *funcInfo, node *VarargExp, a, n int) {
	if !fi.isVararg {
		panic("cannot use '...' outside a vararg function")
	}
	fi.emitVararg(node.Line, a, n)
}

17.4.1 函数定义表达式

// f[a] := function(args) body end
func cgFuncDefExp(fi *funcInfo, node *FuncDefExp, a int) {
	subFI := newFuncInfo(fi, node)
	fi.subFuncs = append(fi.subFuncs, subFI)

	for _, param := range node.ParList {
		subFI.addLocVar(param, 0)
	}

	cgBlock(subFI, node.Block)
	subFI.exitScope(subFI.pc() + 2)
	subFI.emitReturn(node.LastLine, 0, 0)

	bx := len(fi.subFuncs) - 1
	fi.emitClosure(node.LastLine, a, bx)
}

即, 生成新的 funcInfo, 然后跟上层 funcInfo 绑定, 然后将参数转化为局部变量, 然后进行 return, 最后添加 closure.

17.4.2 表构造表达式

func cgTableConstructorExp(fi *funcInfo, node *TableConstructorExp, a int) {
	nArr := 0
	for _, keyExp := range node.KeyExps {
		if keyExp == nil {
			nArr++
		}
	}
	nExps := len(node.KeyExps)
	multRet := nExps > 0 &&
		isVarargOrFuncCall(node.ValExps[nExps-1])

	fi.emitNewTable(node.Line, a, nArr, nExps-nArr)

	arrIdx := 0
	for i, keyExp := range node.KeyExps {
		valExp := node.ValExps[i]

		if keyExp == nil {
			arrIdx++
			tmp := fi.allocReg()
			if i == nExps-1 && multRet {
				cgExp(fi, valExp, tmp, -1)
			} else {
				cgExp(fi, valExp, tmp, 1)
			}

			if arrIdx%50 == 0 || arrIdx == nArr { // LFIELDS_PER_FLUSH
				n := arrIdx % 50
				if n == 0 {
					n = 50
				}
				fi.freeRegs(n)
				line := lastLineOf(valExp)
				c := (arrIdx-1)/50 + 1 // todo: c > 0xFF
				if i == nExps-1 && multRet {
					fi.emitSetList(line, a, 0, c)
				} else {
					fi.emitSetList(line, a, n, c)
				}
			}

			continue
		}

		b := fi.allocReg()
		cgExp(fi, keyExp, b, 1)
		c := fi.allocReg()
		cgExp(fi, valExp, c, 1)
		fi.freeRegs(2)

		line := lastLineOf(valExp)
		fi.emitSetTable(line, a, b, c)
	}
}

先初始化, 然后 emitNewTable(). 接下来开始构造表. 最后做关联处理并 emitSetTable().

17.4.3 运算符表达式

// r[a] := op exp
func cgUnopExp(fi *funcInfo, node *UnopExp, a int) {
	oldRegs := fi.usedRegs
	b, _ := expToOpArg(fi, node.Exp, ArgReg)
	fi.emitUnaryOp(node.Line, node.Op, a, b)
	fi.usedRegs = oldRegs
}

// r[a] = op r[b]
func (cb *codeBuf) emitUnaryOp(line, op, a, b int) {
	switch op {
	case TOKEN_OP_NOT:
		cb.emitABC(line, OP_NOT, a, b, 0)
	case TOKEN_OP_BNOT:
		cb.emitABC(line, OP_BNOT, a, b, 0)
	case TOKEN_OP_LEN:
		cb.emitABC(line, OP_LEN, a, b, 0)
	case TOKEN_OP_UNM:
		cb.emitABC(line, OP_UNM, a, b, 0)
	}
}

运算符表达式同样先申请临时变量, 对表达式求值, 最后释放临时变量并生成相应的一元运算符即可.

// r[a] := exp1 .. exp2
func cgConcatExp(fi *funcInfo, node *ConcatExp, a int) {
	for _, subExp := range node.Exps {
		a := fi.allocReg()
		cgExp(fi, subExp, a, 1)
	}

	c := fi.usedRegs - 1
	b := c - len(node.Exps) + 1
	fi.freeRegs(c - b + 1)
	fi.emitABC(node.Line, OP_CONCAT, a, b, c)
}

由于拼接表达式被我们处理成了树状结构, 因此每次循环先进行求值, 然后执行 emitABC.

// r[a] := exp1 op exp2
func cgBinopExp(fi *funcInfo, node *BinopExp, a int) {
	switch node.Op {
	case TOKEN_OP_AND, TOKEN_OP_OR:
		oldRegs := fi.usedRegs

		b, _ := expToOpArg(fi, node.Exp1, ArgReg)
		fi.usedRegs = oldRegs
		if node.Op == TOKEN_OP_AND {
			fi.emitTestSet(node.Line, a, b, 0)
		} else {
			fi.emitTestSet(node.Line, a, b, 1)
		}
		pcOfJmp := fi.emitJmp(node.Line, 0, 0)

		b, _ = expToOpArg(fi, node.Exp2, ArgReg)
		fi.usedRegs = oldRegs
		fi.emitMove(node.Line, a, b)
		fi.fixSbx(pcOfJmp, fi.pc()-pcOfJmp)
	default:
		oldRegs := fi.usedRegs
		b, _ := expToOpArg(fi, node.Exp1, ArgRK)
		c, _ := expToOpArg(fi, node.Exp2, ArgRK)
		fi.emitBinaryOp(node.Line, node.Op, a, b, c)
		fi.usedRegs = oldRegs
	}
}

17.4.4 名字和表访问表达式

// r[a] := name
func cgNameExp(fi *funcInfo, node *NameExp, a int) {
	if r := fi.slotOfLocVar(node.Name); r >= 0 {
		fi.emitMove(node.Line, a, r)
	} else if idx := fi.indexOfUpval(node.Name); idx >= 0 {
		fi.emitGetUpval(node.Line, a, idx)
	} else { // x => _ENV['x']
		taExp := &TableAccessExp{
			LastLine:  node.Line,
			PrefixExp: &NameExp{node.Line, "_ENV"},
			KeyExp:    &StringExp{node.Line, node.Name},
		}
		cgTableAccessExp(fi, taExp, a)
	}
}

同样区分局部, Upvalue, 全局变量处理即可.

17.4.5 函数调用表达式

// r[a] := f(args)
func cgFuncCallExp(fi *funcInfo, node *FuncCallExp, a, n int) {
	nArgs := prepFuncCall(fi, node, a)
	fi.emitCall(node.Line, a, nArgs, n)
}

// return f(args)
func cgTailCallExp(fi *funcInfo, node *FuncCallExp, a int) {
	nArgs := prepFuncCall(fi, node, a)
	fi.emitTailCall(node.Line, a, nArgs)
}

func prepFuncCall(fi *funcInfo, node *FuncCallExp, a int) int {
	nArgs := len(node.Args)
	lastArgIsVarargOrFuncCall := false

	cgExp(fi, node.PrefixExp, a, 1)
	if node.NameExp != nil {
		fi.allocReg()
		c, k := expToOpArg(fi, node.NameExp, ArgRK)
		fi.emitSelf(node.Line, a, a, c)
		if k == ArgReg {
			fi.freeRegs(1)
		}
	}
	for i, arg := range node.Args {
		tmp := fi.allocReg()
		if i == nArgs-1 && isVarargOrFuncCall(arg) {
			lastArgIsVarargOrFuncCall = true
			cgExp(fi, arg, tmp, -1)
		} else {
			cgExp(fi, arg, tmp, 1)
		}
	}
	fi.freeRegs(nArgs)

	if node.NameExp != nil {
		fi.freeReg()
		nArgs++
	}
	if lastArgIsVarargOrFuncCall {
		nArgs = -1
	}

	return nArgs
}

17.5 生成函数原型

func toProto(fi *funcInfo) *Prototype {
	proto := &Prototype{
		LineDefined:     uint32(fi.line),
		LastLineDefined: uint32(fi.lastLine),
		NumParams:       byte(fi.numParams),
		MaxStackSize:    byte(fi.maxRegs),
		Code:            fi.insts,
		Constants:       getConstants(fi),
		Upvalues:        getUpvalues(fi),
		Protos:          toProtos(fi.subFuncs),
		LineInfo:        fi.lineNums,
		LocVars:         getLocVars(fi),
		UpvalueNames:    getUpvalueNames(fi),
	}

	if fi.line == 0 {
		proto.LastLineDefined = 0
	}
	if proto.MaxStackSize < 2 {
		proto.MaxStackSize = 2 // todo
	}
	if fi.isVararg {
		proto.IsVararg = 1 // todo
	}

	return proto
}


func toProtos(fis []*funcInfo) []*Prototype {
	protos := make([]*Prototype, len(fis))
	for i, fi := range fis {
		protos[i] = toProto(fi)
	}
	return protos
}

func getConstants(fi *funcInfo) []interface{} {
	consts := make([]interface{}, len(fi.constants))
	for k, idx := range fi.constants {
		consts[idx] = k
	}
	return consts
}

func getLocVars(fi *funcInfo) []LocVar {
	locVars := make([]LocVar, len(fi.locVars))
	for i, locVar := range fi.locVars {
		locVars[i] = LocVar{
			VarName: locVar.name,
			StartPC: uint32(locVar.startPC),
			EndPC:   uint32(locVar.endPC),
		}
	}
	return locVars
}

func getUpvalues(fi *funcInfo) []Upvalue {
	upvals := make([]Upvalue, len(fi.upvalues))
	for _, uv := range fi.upvalues {
		if uv.locVarSlot >= 0 { // instack
			upvals[uv.index] = Upvalue{1, byte(uv.locVarSlot)}
		} else {
			upvals[uv.index] = Upvalue{0, byte(uv.upvalIndex)}
		}
	}
	return upvals
}

func getUpvalueNames(fi *funcInfo) []string {
	names := make([]string, len(fi.upvalues))
	for name, uv := range fi.upvalues {
		names[uv.index] = name
	}
	return names
}

17.6 使用编译器

func Compile(chunk, chunkName string) *binchunk.Prototype {
	ast := parser.Parse(chunk, chunkName)
	proto := codegen.GenProto(ast)
	setSource(proto, chunkName)
	return proto
}

func setSource(proto *binchunk.Prototype, chunkName string) {
	proto.Source = chunkName
	for _, f := range proto.Protos {
		setSource(f, chunkName)
	}
}

17.7 测试本章代码

17.8 本章小结