即Lua预编译形成的虚拟机执行的字节码. 叫做预编译 Chunk (Precomplied Chunk) 或二进制 Chunk (Binary Chunk).
Lua 编译器以函数为单位进行编译, 每个函数都会被编译为一个内部结构, 称为 原型 (Prototype).
原型包括:
函数原型是一种递归结构, 并且 Lua 源码中函数的嵌套关系会直接反映在编译后的原型里.
在编译过程, 暴露在全局的语句会被包装进 main 函数. 这是自动完成的.
这是 main 的:
[root@m01 c1]# luac -l hello.out
main <./hello.lua:0,0> (4 instructions, 16 bytes at 0x173c530)
0+ params, 2 slots, 0 upvalues, 0 locals, 2 constants, 0 functions
1 [1] GETGLOBAL 0 -1 ; print
2 [1] LOADK 1 -2 ; "hello"
3 [1] CALL 0 2 1
4 [1] RETURN 0 1
这是 foo-bar 的:
[root@m01 c1]# luac -l foobar.lua
main <foobar.lua:0,0> (3 instructions, 12 bytes at 0xb74530)
0+ params, 2 slots, 0 upvalues, 0 locals, 1 constant, 1 function
1 [3] CLOSURE 0 0 ; 0xb74710
2 [1] SETGLOBAL 0 -1 ; foo
3 [3] RETURN 0 1
function <foobar.lua:1,3> (3 instructions, 12 bytes at 0xb74710)
0 params, 2 slots, 0 upvalues, 0 locals, 1 constant, 1 function
1 [2] CLOSURE 0 0 ; 0xb748c0
2 [2] SETGLOBAL 0 -1 ; bar
3 [3] RETURN 0 1
function <foobar.lua:2,2> (1 instruction, 4 bytes at 0xb748c0)
0 params, 2 slots, 0 upvalues, 0 locals, 0 constants, 0 functions
1 [2] RETURN 0 1
[root@m01 c1]#
名称 <所在文件: 开始行号, 结束行号> (指令数量, 函数地址)
固定参数数量 (如果有+, 则代表是 vararg 函数), 运行函数所需寄存器数量, upvalue 数量, 局部变量数量, 常量数量, 子函数数量.
指令序号, 对应行号, 操作码, 操作数, ; 后是注释.
使用 -l -l
可以查看详情, 包括常量列表, 局部变量, upvalue 表:
[root@m01 c1]# luac -l -l hello.lua
main <hello.lua:0,0> (4 instructions, 16 bytes at 0x696530)
0+ params, 2 slots, 0 upvalues, 0 locals, 2 constants, 0 functions
1 [1] GETGLOBAL 0 -1 ; print
2 [1] LOADK 1 -2 ; "hello"
3 [1] CALL 0 2 1
4 [1] RETURN 0 1
constants (2) for 0x696530:
1 "print"
2 "hello"
locals (0) for 0x696530:
upvalues (0) for 0x696530:
[root@m01 c1]#
Lua 二进制 chunk 一些细节事项:
二进制 chunk 内部使用的数据类型大致可分为 数字, 字符串, 列表 三种.
数据类型 | C 语言类型 | Go 语言类型 | 占用字节 |
---|---|---|---|
字节 | lu_Byte (unsigned char) | byte | 1 |
C 语言整型 | int | uint32 | 4 |
C 语言 size_t 类型 | size_t | uint64 | 8 |
Lua 整型 | lua_Integer (long long) | int64 | 8 |
Lua 浮点型 | lua_Number (double) | float64 | 8 |
字符串在二进制 chunk 里面是字节数组, 并记录长度.
指令表, 常量表, 子函数原型表等均使用列表存储.
先用 cint 类型记录列表长度, 然后存储 n 个列表元素.
用于抽象的 binaryChunk 结构为:
package binchunk type binaryChunk struct { header // 头部 sizeUpvalues byte // 主函数 upvalue mainFunc *Prototype // 主函数原型 }
二进制 chunk 头部:
type header struct signature [4]byte version byte format byte luacData [6]byte cintByte byte sizetSize byte instructionSize byte luaIntegerSize byte luaNumberSize byte luacInt int64 luacNum float64 }
Lua 的二进制起始签名 (魔数, Magic Number) 是 ESC, L, u, a 的 ASCII码. 即 0x1B4C7561, Go 字面量为 "\x1bLua".
[root@m01 c1]# xxd -u -g 1 ./hello.out
0000000: 1B 4C 75 61 51 00 01 04 08 04 08 00 0D 00 00 00 .LuaQ...........
0000010: 00 00 00 00 40 2E 2F 68 65 6C 6C 6F 2E 6C 75 61 ....@./hello.lua
0000020: 00 00 00 00 00 00 00 00 00 00 00 02 02 04 00 00 ................
0000030: 00 05 00 00 00 41 40 00 00 1C 40 00 01 1E 00 80 .....A@...@.....
0000040: 00 02 00 00 00 04 06 00 00 00 00 00 00 00 70 72 ..............pr
0000050: 69 6E 74 00 04 06 00 00 00 00 00 00 00 68 65 6C int..........hel
0000060: 6C 6F 00 00 00 00 00 04 00 00 00 01 00 00 00 01 lo..............
0000070: 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 00 ................
0000080: 00 00 00
紧跟在二进制签名后, 仅包含大版本号 (Major Version) 和 小版本号(Minor Version). lua 5.3.4 即 5 x 16 + 3 = 83, 即 0x53.
lua 虚拟机定义的格式号即接下来一个字节, 0.
同样用来二进制检查, "\x19\x93\r\n\x1a\n".
接下来5个字节记录 cint, size_t, lua 虚拟机指令, lua 整数, lua 浮点数 这5种数据类型在二进制 chunk 占用的字节数.
接下来存放的是整数值 0x5678 用于检查大小端.
然后存放浮点数 370.5 用以检测浮点数格式.
函数原型结构体为:
type Prototype struct { Source string LineDefined uint32 LastLineDefined uint32 NumParams byte IsVararg byte MaxStackSize byte Code []uint32 Constants []interface{} Upvalues []Upvalue Protos []*Prototype LineInfo []uint32 LocVars []LocVar UpvalueNames []string }
只有在主函数中, 此字段才有值. 格式为长度+1和字符串. 字符串 @ 开头代表从 lua 源文件编译而来. =stdin 则代表 从 stdin 编译.
记录原型对应起止行号.
相对于变长参数.
即是否有变长参数.
也被称作 MaxStackSize.
每条指令是 4 字节.
二进制 chunk 常量 tag 值:
tag | Lua 字面量类型 | 存储类型 |
---|---|---|
0x00 | nil | 不存储 |
0x01 | boolean | 字节 (0, 1) |
0x03 | number | Lua 浮点数 |
0x13 | integer | Lua 整数 |
0x04 | string | 短字符串 |
0x14 | string | 长字符串 |
每个元素占两个字节.
行号表中的行号与指令表中的指令一一对应, 分别记录每条指令在源代码中对应的行号.
每个元素都包含变量名 (按字符串类型存储), 起止指令索引 (按 cint 类型存储).
与 Upvalue 表一一对应, 用于记录 Upvalue 在源代码中的名字.
写了个校验头部的函数.
这里实现了:
func (self *reader) readByte() byte { b := self.data[0] self.data = self.data[1:] return b }
其余函数都基于类似 readByte()
的模式进行调整:
func (self *reader) read?() byte { i := binary.LittleEndian.?() self.data = self.data[?:] // 这里进行 maring return i }
readString() 则更复杂些, 需要根据字符串长度进行判断.
检查头部标志.
func (r *reader) checkHeader() { if string(r.readBytes(4)) != LUA_SIGNATURE { panic("not a precompiled chunk") } if r.readByte() != LUAC_VERSION { panic("version mismatch!") } if r.readByte() != LUAC_FORMAT { panic("format mismatch!") } if string(r.readBytes(6)) != LUAC_DATA { panic("corrupted!") } if r.readByte() != CINT_SIZE { panic("int size mismatch!") } if r.readByte() != CSIZET_SIZE { panic("size_t size mismatch!") } if r.readByte() != INSTRUCTION_SIZE { panic("instruction size mismatch!") } if r.readByte() != LUA_INTEGER_SIZE { panic("lua_Integer size mismatch!") } if r.readByte() != LUA_NUMBER_SIZE { panic("lua_Number size mismatch!") } if r.readLuaInteger() != LUAC_INT { panic("endianness mismatch!") } if r.readLuaNumber() != LUAC_NUM { panic("float format mismatch!") } }
其中 readProto() 用于读取函数原型, 用于填充 Prototype struct. 其中读取 Byte 部分可以直接使用 readByte(), 而其他较长或复杂部分还需要基于之前定义的 read*() 函数进行包装. 包括:
本节实现了类似 luac -l -l 的 Lua chunk 解析功能. 具体看源代码即可, 没有复杂内容.
基于栈的虚拟机需要使用 PUSH, POP 类指令操作栈. 因此指令集相对较大, 但指令平均长度较短. 基于寄存器的虚拟机可以直接二对寄存器进行寻址, 因此指令集相对较小, 但由于需要把寄存器递指编码进指令, 因此指令平均长度比较长.
Lua 虚拟机采用定长指令集, 每条指令占4个字节 (32b), 6b 用于 Opcode (操作码), 其余 26b 用于 Operand (操作数).
讨论编码模式, 操作数, 操作码.
Lua 虚拟机指令可以分为4类, 分别对应4种编码模式:
在 Lua 虚拟机总计 47 条指令中, 39 条使用 iABC 模式. 剩余 iABx (3), iAsBx (4), iAx (1).
总计 47 个.
操作数 A 主要用于表示目标寄存器索引. 其他操作数则可分为 4 种类型:
opcode 结构体:
type opcode struct { testFlag byte // operator is a test (next instruction must be a jump) setAFlag byte // instruction set register A argBMode byte // B arg mode argCMode byte // C arg mode opMode byte // op mode name string action func(i Instruction, vm api.LuaVM) }
然后源文件 lua.go/vm/opcodes.go: var opcodes
定义了所有 opcode. 格式例如:
// T A B C mode name opcode{0, 1, OpArgR, OpArgN, IABC, "MOVE "}
对应上述结构体.
这里使用了按照操作数或操作码长度进行按位与操作, 以便从 32bit 长度中提取出所需长度的值.
其中 iAsBx 提取会复杂一些:
func (instr Instruction) ABx() (a, bx int) { a = int(instr >> 6 & 0xFF) bx = int(instr >> 14) return } func (instr Instruction) AsBx() (a, sbx int) { a, bx := instr.ABx() return a, bx - MAXARG_sBx }
用了 MAXARG_sBx 作为 padding 进行位操作.
lua_State 封装了 lua 解释器.
lua state 是宿主语言和 lua 沟通得桥梁, Lua API 函数其中有很大ui部分是专门用来操作 Lua 栈的.
本节介绍 lua state 栈里能够存放哪些值, 然后介绍如何按照索引来存取这些值, 最后编码实现.
Lua 是动态类型语言, 在 Lua 代码里, 变量是不携带类型信息的, 变量的值才携带类型信息.
Lua 内置了 8 种数据类型:
print(type(nil)) -- nil print(type(true)) -- boolean print(type(3.14)) -- number print(type("hello")) -- string print(type({})) -- table print(type(print)) -- function print(type(coroutine.create(print))) -- thread print(type(io.stdin)) -- userdata
字符串是简单的字节数组. 前面几种基本类型也可以很方便映射, 映射表如下:
Lua 类型 | Go 类型 |
---|---|
nil | nil |
boolean | bool |
integer | int64 |
float | float64 |
string | string |
首先定义类型常量:
/* basic types */ const ( LUA_TNONE = iota - 1 // -1 LUA_TNIL LUA_TBOOLEAN LUA_TLIGHTUSERDATA LUA_TNUMBER LUA_TSTRING LUA_TTABLE LUA_TFUNCTION LUA_TUSERDATA LUA_TTHREAD )
其中多出来的 LUA_TNONE 用于针对栈索引的无效值.
然后是对应类型的值类型 luaValue :
type luaValue interface{}
实现 typeOf(), 这里定义了 luaValue 里面有个 type 属性:
func typeOf(val luaValue) LuaType { switch val.(type) { case nil: return LUA_TNIL case bool: return LUA_TBOOLEAN case int64, float64: return LUA_TNUMBER case string: return LUA_TSTRING case *luaTable: return LUA_TTABLE case *closure: return LUA_TFUNCTION case *luaState: return LUA_TTHREAD case *userdata: return LUA_TUSERDATA default: panic("unknown type!") } }
这里描述了 Lua 栈的设计:
luaStack 实现:
type luaStack struct { /* virtual stack */ slots []luaValue top int /* call info */ state *luaState closure *closure varargs []luaValue openuvs map[int]*upvalue pc int /* linked list */ prev *luaStack }
slots 存储值, 即 luaValue. top 记录栈顶索引.
newLuaStack() 用于创建指定容量的栈, 实现:
func newLuaStack(size int, state *luaState) *luaStack { return &luaStack{ state: state, slots: make([]luaValue, size), } }
check() 方法检查栈的空闲空间是否还可以容纳 n 个值. 如果不满足则进行扩容. (这里总感觉是个坏味道? check 还能扩容?)
func (stack *luaStack) check(n int) { free := len(stack.slots) - stack.top for i := free; i < n; i++ { stack.slots = append(stack.slots, nil) } }
push() 将值压入栈顶, 如果溢出则 painc().
func (stack *luaStack) push(val luaValue) { if stack.top == len(stack.slots) { panic("stack overflow!") } stack.slots[stack.top] = val stack.top++ }
pop() 则弹出, 如果为空则 painc().
func (stack *luaStack) pop() luaValue { if stack.top < 1 { panic("stack underflow!") } stack.top-- val := stack.slots[stack.top] stack.slots[stack.top] = nil return val }
absIndex() 将索引转换为绝对索引.
/* @@ LUAI_MAXSTACK limits the size of the Lua stack. ** CHANGE it if you need a different limit. This limit is arbitrary; ** its only purpose is to stop Lua from consuming unlimited stack ** space (and to reserve some numbers for pseudo-indices). */ const LUAI_MAXSTACK = 1000000
/* ** Pseudo-indices ** (-LUAI_MAXSTACK is the minimum valid index; we keep some free empty ** space after that to help overflow detection) */ const LUA_REGISTRYINDEX = -LUAI_MAXSTACK - 1000
即定义了一个最大达到 -LUAI_MAXSTACK - 1000 的区间用于操作 upvalue .
func (stack *luaStack) absIndex(idx int) int { // zero or positive or pseudo if idx >= 0 || idx <= LUA_REGISTRYINDEX { return idx } // negative return idx + stack.top + 1 }
isValid() 判断索引是否有效:
func (stack *luaStack) isValid(idx int) bool { if idx < LUA_REGISTRYINDEX { /* upvalues */ uvIdx := LUA_REGISTRYINDEX - idx - 1 c := stack.closure return c != nil && uvIdx < len(c.upvals) } if idx == LUA_REGISTRYINDEX { return true } absIdx := stack.absIndex(idx) return absIdx > 0 && absIdx <= stack.top }
get() 依据索引从栈取值.
func (stack *luaStack) get(idx int) luaValue { if idx < LUA_REGISTRYINDEX { /* upvalues */ uvIdx := LUA_REGISTRYINDEX - idx - 1 c := stack.closure if c == nil || uvIdx >= len(c.upvals) { return nil } return *(c.upvals[uvIdx].val) } if idx == LUA_REGISTRYINDEX { return stack.state.registry } absIdx := stack.absIndex(idx) if absIdx > 0 && absIdx <= stack.top { return stack.slots[absIdx-1] } return nil }
set() 写入值.
func (stack *luaStack) set(idx int, val luaValue) { if idx < LUA_REGISTRYINDEX { /* upvalues */ uvIdx := LUA_REGISTRYINDEX - idx - 1 c := stack.closure if c != nil && uvIdx < len(c.upvals) { *(c.upvals[uvIdx].val) = val } return } if idx == LUA_REGISTRYINDEX { stack.state.registry = val.(*luaTable) return } absIdx := stack.absIndex(idx) if absIdx > 0 && absIdx <= stack.top { stack.slots[absIdx-1] = val return } panic("invalid index!") }
Lua State 封装了整个 Lua 解释器状态.
type LuaState interface { BasicAPI DebugAPI AuxLib String() string // debug } type BasicAPI interface { /* state manipulation */ Close() // AtPanic(panicf GoFunction) GoFunction // set panic function Version() float64 // get version number /* basic stack manipulation */ GetTop() int // stack.top AbsIndex(idx int) int // abs(idx) UpvalueIndex(idx int) int // CheckStack(n int) bool // Pop(n int) // pop(n) Copy(fromIdx, toIdx int) // r[toIdx] = r[fromidx] PushValue(idx int) // push(r[idx]) Replace(idx int) // r[idx] = pop() Insert(idx int) // r[idx, -1] >> 1 Remove(idx int) // remove(r[idx]) Rotate(idx, n int) // r[idx, -1] >> n SetTop(idx int) // stack.top = idx XMove(to LuaState, n int) // to.push(pop(n)) /* access functions (stack -> C) */ TypeName(tp LuaType) string // r[idx].type.name Type(idx int) LuaType // r[idx].type IsNone(idx int) bool // r[idx].type == LUA_TNONE IsNil(idx int) bool // r[idx].type == LUA_TNIL IsNoneOrNil(idx int) bool // r[idx].type == LUA_TNONE || LUA_TNIL IsBoolean(idx int) bool // r[idx].type == LUA_TBOOLEAN IsInteger(idx int) bool // r[idx].type == LUA_TINTEGER IsNumber(idx int) bool // r[idx] ~= LuaNumber IsString(idx int) bool // r[idx] ~= LuaString IsTable(idx int) bool // r[idx].type == LUA_TTABLE IsThread(idx int) bool // r[idx].type == LUA_TTHREAD IsFunction(idx int) bool // r[idx].type == LUA_TFUNCTION IsGoFunction(idx int) bool // r[idx].type == LUA_TLCL || LUA_TGCL IsUserData(idx int) bool // r[idx].type == LUA_TUSERDATA ToBoolean(idx int) bool // r[idx] as bool ToInteger(idx int) int64 // r[idx] as LuaInteger ToIntegerX(idx int) (int64, bool) // r[idx] as LuaInteger ToNumber(idx int) float64 // r[idx] as LuaNumber ToNumberX(idx int) (float64, bool) // r[idx] as LuaNumber ToString(idx int) string // r[idx] as string ToStringX(idx int) (string, bool) // r[idx] as string ToGoFunction(idx int) GoFunction // r[idx] as GoFunction ToThread(idx int) LuaState // r[idx] as LuaThread ToUserData(idx int) UserData // r[idx] as UserData ToPointer(idx int) interface{} // r[idx] as interface{} RawLen(idx int) uint // len(r[idx]) /* push functions (C -> stack) */ PushNil() // push(nil) PushBoolean(b bool) // push(b) PushInteger(n int64) // push(n) PushNumber(n float64) // push(n) PushString(s string) // push(s) PushFString(fmt string, a ...interface{}) string // push(fmt*a) PushGoFunction(f GoFunction) // push(f) PushGoClosure(f GoFunction, n int) // push(f) PushLightUserData(d UserData) // push(d) PushGlobalTable() // push(global) PushThread() bool // push(thread) /* Comparison and arithmetic functions */ Arith(op ArithOp) // b=pop(); a=pop(); push(a op b) Compare(idx1, idx2 int, op CompareOp) bool // r[idx1] op r[idx2] RawEqual(idx1, idx2 int) bool // r[idx1] == r[idx2] /* get functions (Lua -> stack) */ NewTable() // push({}) CreateTable(nArr, nRec int) // push({}) GetTable(idx int) LuaType // push(r[idx][pop()]) GetField(idx int, k string) LuaType // push(r[idx][k]) GetI(idx int, i int64) LuaType // push(r[idx][i]) RawGet(idx int) LuaType // push(r[idx][pop()]) RawGetI(idx int, i int64) LuaType // push(r[idx][i]) RawGetP(idx int, p UserData) LuaType // push(r[idx][p]) GetGlobal(name string) LuaType // push(global[name]) GetMetatable(idx int) bool // push(r[idx].metatable)? GetUserValue(idx int) LuaType // push(r[idx].userValue) /* set functions (stack -> Lua) */ SetTable(idx int) // v=pop(); k=pop(); r[idx][k] = v SetField(idx int, k string) // r[idx][k] = pop() SetI(idx int, i int64) // r[idx][i] = pop() RawSet(idx int) // v=pop(); k=pop(); r[idx][k] = v RawSetI(idx int, i int64) // r[idx][i] = pop() RawSetP(idx int, p UserData) // r[idx][p] = pop() Register(name string, f GoFunction) // global[name] = f SetGlobal(name string) // global[name] = pop() SetMetatable(idx int) // r[idx].metatable = pop() SetUserValue(idx int) // r[idx].userValue = pop() /* 'load' and 'call' functions (load and run Lua code) */ Dump(strip bool) []byte // todo Load(chunk []byte, chunkName, mode string) ThreadStatus // push(compile(chunk)) Call(nArgs, nResults int) // args=pop(nArgs); f=pop(); f(args) CallK() // PCall(nArgs, nResults, msgh int) ThreadStatus // call(nArgs, nResults) || push(err) PCallK() // /* miscellaneous functions */ Concat(n int) // push(concat(pop(n))) Len(idx int) // push(len(r[idx])) Next(idx int) bool // key=pop(); k,v=next(r[idx]); push(k,v); StringToNumber(s string) bool // push(number(s)) Error() int // panic(r[-1]) /* coroutine functions */ NewThread() LuaState // todo Resume(from LuaState, nArgs int) ThreadStatus // todo Yield(nResults int) int // todo YieldK() // todo Status() ThreadStatus // todo IsYieldable() bool // todo /* garbage-collection function and options */ GC(what, data int) int // }
实现 luaState
type luaState struct { /* global state */ hook LuaHook hookMask int panicf GoFunction registry *luaTable /* stack */ stack *luaStack callDepth int /* coroutine */ coStatus ThreadStatus coCaller *luaState coChan chan int }
New() 创建 luaState().
func New() LuaState { ls := &luaState{} registry := newLuaTable(8, 0) registry.put(LUA_RIDX_MAINTHREAD, ls) registry.put(LUA_RIDX_GLOBALS, newLuaTable(0, 20)) ls.registry = registry ls.pushLuaStack(newLuaStack(LUA_MINSTACK, ls)) return ls }
GetTop() 反获栈顶索引.
// [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_gettop // lua-5.3.4/src/lapi.c#lua_gettop() func (state *luaState) GetTop() int { return state.stack.top }
AbsIndex() 把索引转换为绝对索引.
// [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_absindex // lua-5.3.4/src/lapi.c#lua_absindex() func (state *luaState) AbsIndex(idx int) int { return state.stack.absIndex(idx) }
CheckStack() 进行检查如果不足则扩容.
// [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_checkstack // lua-5.3.4/src/lapi.c#lua_checkstack() func (state *luaState) CheckStack(n int) bool { state.stack.check(n) return true // never fails }
Pop() 从栈顶弹出 n 个值.
// [-n, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_pop // lua-5.3.4/src/lua.h#lua_pop() func (state *luaState) Pop(n int) { for i := 0; i < n; i++ { state.stack.pop() } }
Copy() 把值从一个位置复制到另一个位置 (栈内复制).
// [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_copy // lua-5.3.4/src/lapi.c#lua_copy() func (state *luaState) Copy(fromIdx, toIdx int) { val := state.stack.get(fromIdx) state.stack.set(toIdx, val) }
PushValue() 把指定索引处的值推入栈顶.
// [-0, +1, –] // http://www.lua.org/manual/5.3/manual.html#lua_pushvalue // lua-5.3.4/src/lapi.c#lua_pushvalue() func (state *luaState) PushValue(idx int) { val := state.stack.get(idx) state.stack.push(val) }
Replace() 是 PushValue() 的反操作, 将栈顶值弹出并写入指定位置.
// [-1, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_replace // lua-5.3.4/src/lua.h#lua_replace() func (state *luaState) Replace(idx int) { val := state.stack.pop() state.stack.set(idx, val) }
Insert() 将栈顶值弹出, 并插入指定位置. 是 Rotate() 的一个特例.
// [-1, +1, –] // http://www.lua.org/manual/5.3/manual.html#lua_insert // lua-5.3.4/src/lua.h#lua_insert() func (state *luaState) Insert(idx int) { state.Rotate(idx, 1) }
Remove() 删除指定索引处的值. 可以用 Rotate() 组合 Pop() 实现.
// [-1, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_remove // lua-5.3.4/src/lua.h#lua_remove() func (state *luaState) Remove(idx int) { state.Rotate(idx, -1) state.Pop(1) }
Rotate() 方法将 [idx, top] 索引区间内的值朝栈顶方向旋转 n 个位置, 如果 n 是负数, 那么朝栈底方向旋转. 可以理解为履带式滚动位置.
// [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_rotate // lua-5.3.4/src/lapi.c#lua_rotate() func (state *luaState) Rotate(idx, n int) { t := state.stack.top - 1 /* end of stack segment being rotated */ p := state.stack.absIndex(idx) - 1 /* start of segment */ var m int /* end of prefix */ if n >= 0 { m = t - n } else { m = p - n - 1 } state.stack.reverse(p, m) /* reverse the prefix with length 'n' */ state.stack.reverse(m+1, t) /* reverse the suffix */ state.stack.reverse(p, t) /* reverse the entire segment */ }
Reverse() 实现:
func (stack *luaStack) reverse(from, to int) { slots := stack.slots for from < to { slots[from], slots[to] = slots[to], slots[from] from++ to-- } }
SetTop() 将栈顶索引设置为指定值. 如果指定值小于当前栈顶索引, 效果则相当于弹出 (指定为 0 则是清空). 如果大于当前栈顶索引, 则相当于插入 nil.
// [-?, +?, –] // http://www.lua.org/manual/5.3/manual.html#lua_settop // lua-5.3.4/src/lapi.c#lua_settop() func (state *luaState) SetTop(idx int) { newTop := state.stack.absIndex(idx) if newTop < 0 { panic("stack underflow!") } n := state.stack.top - newTop if n > 0 { for i := 0; i < n; i++ { state.stack.pop() } } else if n < 0 { for i := 0; i > n; i-- { state.stack.push(nil) } } }
这里的 Push() 方法将外部值推入栈顶.
// [-0, +1, –] // http://www.lua.org/manual/5.3/manual.html#lua_pushnil func (state *luaState) PushNil() { state.stack.push(nil) } // [-0, +1, –] // http://www.lua.org/manual/5.3/manual.html#lua_pushboolean func (state *luaState) PushBoolean(b bool) { state.stack.push(b) } // [-0, +1, –] // http://www.lua.org/manual/5.3/manual.html#lua_pushinteger func (state *luaState) PushInteger(n int64) { state.stack.push(n) } // [-0, +1, –] // http://www.lua.org/manual/5.3/manual.html#lua_pushnumber func (state *luaState) PushNumber(n float64) { state.stack.push(n) } // [-0, +1, m] // http://www.lua.org/manual/5.3/manual.html#lua_pushstring func (state *luaState) PushString(s string) { state.stack.push(s) } // [-0, +1, e] // http://www.lua.org/manual/5.3/manual.html#lua_pushfstring func (state *luaState) PushFString(fmtStr string, a ...interface{}) string { str := fmt.Sprintf(fmtStr, a...) state.stack.push(str) return str }
Access 系列方法用于从栈里获取信息.
TypeName() 获取类型的字符串名称.
// [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_typename // lua-5.3.4/src/lapi.c#lua_typename() func (state *luaState) TypeName(tp LuaType) string { switch tp { case LUA_TNONE: return "no value" case LUA_TNIL: return "nil" case LUA_TBOOLEAN: return "boolean" case LUA_TNUMBER: return "number" case LUA_TSTRING: return "string" case LUA_TTABLE: return "table" case LUA_TFUNCTION: return "function" case LUA_TUSERDATA: return "userdata" case LUA_TTHREAD: return "thread" default: panic("unreachable!") } }
Type() 根据索引反获值的类型.
// [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_type // lua-5.3.4/src/lapi.c#lua_type() func (state *luaState) Type(idx int) LuaType { if state.stack.isValid(idx) { val := state.stack.get(idx) return typeOf(val) } else { return LUA_TNONE } }
IsType, IsNone, IsNil, IsNoneOrNil, IsBoolean.
// [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_isnone // lua-5.3.4/src/lua.h#lua_isnone() func (state *luaState) IsNone(idx int) bool { return state.Type(idx) == LUA_TNONE } // [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_isnil // lua-5.3.4/src/lua.h#lua_isnil() func (state *luaState) IsNil(idx int) bool { return state.Type(idx) == LUA_TNIL } // [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_isnoneornil // lua-5.3.4/src/lua.h#lua_isnoneornil() func (state *luaState) IsNoneOrNil(idx int) bool { return state.Type(idx) <= 0 } // [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_isboolean // lua-5.3.4/src/lua.h#lua_isboolean() func (state *luaState) IsBoolean(idx int) bool { return state.Type(idx) == LUA_TBOOLEAN } // [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_isstring // lua-5.3.4/src/lapi.c#lua_isstring() func (state *luaState) IsString(idx int) bool { t := state.Type(idx) return t == LUA_TSTRING || t == LUA_TNUMBER } // [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_isnumber // lua-5.3.4/src/lapi.c#lua_isnumber() func (state *luaState) IsNumber(idx int) bool { _, ok := state.ToNumberX(idx) return ok } // [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_isinteger // lua-5.3.4/src/lapi.c#lua_isinteger() func (state *luaState) IsInteger(idx int) bool { val := state.stack.get(idx) _, ok := val.(int64) return ok }
ToBoolean() 从指定索引处取出一个布尔值, 如果不是布尔类型, 则进行类型转换.
// [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_toboolean func (state *luaState) ToBoolean(idx int) bool { val := state.stack.get(idx) return convertToBoolean(val) }
convertToBoolean():
func convertToBoolean(val luaValue) bool { switch x := val.(type) { case nil: return false case bool: return x default: return true } }
ToNumber 和 ToNumberX, 从指定索引处取出一个数字. 如果不是数字类型则进行类型转换. 转换失败时, ToNumber 返回 0, ToNumberX 返回是否成功.
// [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_tonumber func (state *luaState) ToNumber(idx int) float64 { n, _ := state.ToNumberX(idx) return n } // [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_tonumberx func (state *luaState) ToNumberX(idx int) (float64, bool) { val := state.stack.get(idx) return convertToFloat(val) }
ToInteger, ToIntegerX
// [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_tointeger func (state *luaState) ToInteger(idx int) int64 { i, _ := state.ToIntegerX(idx) return i } // [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_tointegerx func (state *luaState) ToIntegerX(idx int) (int64, bool) { val := state.stack.get(idx) return convertToInteger(val) }
ToString, ToStringX 同样, 不过该函数会修改栈. 将非 string 转换为 string.
// [-0, +0, m] // http://www.lua.org/manual/5.3/manual.html#lua_tostring // http://www.lua.org/manual/5.3/manual.html#lua_tolstring // lua-5.3.4/src/lua.h#lua_tostring() // lua-5.3.4/src/lapi.c#lua_tolstring() func (state *luaState) ToString(idx int) string { s, _ := state.ToStringX(idx) return s } func (state *luaState) ToStringX(idx int) (string, bool) { val := state.stack.get(idx) switch x := val.(type) { case string: return x, true case int64, float64: s := fmt.Sprintf("%v", x) // todo state.stack.set(idx, s) // 这里会修改栈 return s, true default: return "", false } }
可分为算术 (Arithmetic) 运算符, 按位 (Bitwise) 运算符, 比较 (Comparison) 运算符, 逻辑 (Logical) 运算符, 长度运算符, 字符串拼接运算符.
需要考虑的问题有:
具体实现:
package number import "math" func FloatToInteger(f float64) (int64, bool) { // todo: correct? i := int64(f) return i, float64(i) == f } // a % b == a - ((a // b) * b) func IMod(a, b int64) int64 { return a - IFloorDiv(a, b)*b } // a % b == a - ((a // b) * b) // lua-5.3.4/src/llimits.h#luai_nummod func FMod(a, b float64) float64 { if a > 0 && math.IsInf(b, 1) || a < 0 && math.IsInf(b, -1) { return a } if a > 0 && math.IsInf(b, -1) || a < 0 && math.IsInf(b, 1) { return b } return a - math.Floor(a/b)*b } func IFloorDiv(a, b int64) int64 { if a > 0 && b > 0 || a < 0 && b < 0 || a%b == 0 { return a / b } else { return a/b - 1 } } func FFloorDiv(a, b float64) float64 { return math.Floor(a / b) } func ShiftLeft(a, n int64) int64 { if n >= 0 { return a << n } else { return ShiftRight(a, -n) } } func ShiftRight(a, n int64) int64 { if n >= 0 { return int64(uint64(a) >> n) } else { return ShiftLeft(a, -n) } }
同上, 其中左移和右移需要特殊处理.
实现 ==, <, <=
后, 就可以做等价运算了.
Lua 对逻辑与和逻辑或表达式进行短路求值.
即 #
.
即 ..
.
print(3.14 >> 2)
直接转换.
func ParseInteger(str string) (int64, bool) { str = strings.TrimSpace(str) str = strings.ToLower(str) if !reInteger.MatchString(str) { // float? return 0, false } if str[0] == '+' { str = str[1:] } if strings.Index(str, "0x") < 0 { // decimal i, err := strconv.ParseInt(str, 10, 64) return i, err == nil } // hex var sign int64 = 1 if str[0] == '-' { sign = -1 str = str[3:] } else { str = str[2:] } if len(str) > 16 { str = str[len(str)-16:] // cut long hex string } i, err := strconv.ParseUint(str, 16, 64) return sign * int64(i), err == nil } func ParseFloat(str string) (float64, bool) { str = strings.TrimSpace(str) str = strings.ToLower(str) if strings.Contains(str, "nan") || strings.Contains(str, "inf") { return 0, false } if strings.HasPrefix(str, "0x") && len(str) > 2 { return parseHexFloat(str[2:]) } if strings.HasPrefix(str, "+0x") && len(str) > 3 { return parseHexFloat(str[3:]) } if strings.HasPrefix(str, "-0x") && len(str) > 3 { f, ok := parseHexFloat(str[3:]) return -f, ok } f, err := strconv.ParseFloat(str, 64) return f, err == nil }
目前实现还没这么复杂, 直接用的 Go 内置的转换.
这两个一样, 都是先判断类型, 数字类型可以进行转换, 字符串尝试转换, 其他类型则是0.
type LuaState interface { BasicAPI DebugAPI AuxLib String() string // debug } type BasicAPI interface { Arith(op ArithOp) // b=pop(); a=pop(); push(a op b) Compare(idx1, idx2 int, op CompareOp) bool // r[idx1] op r[idx2] Concat(n int) // push(concat(pop(n))) Len(idx int) // push(len(r[idx])) }
新增实现这几个 LuaState 接口.
定义运算符:
/* arithmetic functions */ const ( LUA_OPADD = iota // + LUA_OPSUB // - LUA_OPMUL // * LUA_OPMOD // % LUA_OPPOW // ^ LUA_OPDIV // / LUA_OPIDIV // // LUA_OPBAND // & LUA_OPBOR // | LUA_OPBXOR // ~ LUA_OPSHL // << LUA_OPSHR // >> LUA_OPUNM // - LUA_OPBNOT // ~ )
var ( iadd = func(a, b int64) int64 { return a + b } fadd = func(a, b float64) float64 { return a + b } isub = func(a, b int64) int64 { return a - b } fsub = func(a, b float64) float64 { return a - b } imul = func(a, b int64) int64 { return a * b } fmul = func(a, b float64) float64 { return a * b } div = func(a, b float64) float64 { return a / b } iunm = func(a, _ int64) int64 { return -a } funm = func(a, _ float64) float64 { return -a } band = func(a, b int64) int64 { return a & b } bor = func(a, b int64) int64 { return a | b } bxor = func(a, b int64) int64 { return a ^ b } bnot = func(a, _ int64) int64 { return ^a } )
容纳整数和浮点数类型的运算的结构体为:
type operator struct { metamethod string integerFunc func(int64, int64) int64 floatFunc func(float64, float64) float64 }
操作符 slice, 用于包装操作不同数据类型的同一操作符, 即上面的结构体的实例化:
var operators = []operator{ operator{"__add", iadd, fadd}, operator{"__sub", isub, fsub}, operator{"__mul", imul, fmul}, operator{"__mod", number.IMod, number.FMod}, operator{"__pow", nil, math.Pow}, operator{"__div", nil, div}, operator{"__idiv", number.IFloorDiv, number.FFloorDiv}, operator{"__band", band, nil}, operator{"__bor", bor, nil}, operator{"__bxor", bxor, nil}, operator{"__shl", number.ShiftLeft, nil}, operator{"__shr", number.ShiftRight, nil}, operator{"__unm", iunm, funm}, operator{"__bnot", bnot, nil}, }
实现 Arith():
// [-(2|1), +1, e] // http://www.lua.org/manual/5.3/manual.html#lua_arith func (state *luaState) Arith(op ArithOp) { // 从栈取出 oprands var a, b luaValue // operands b = state.stack.pop() if op != LUA_OPUNM && op != LUA_OPBNOT { a = state.stack.pop() } else { a = b } operator := operators[op] if result := _arith(a, b, operator); result != nil { state.stack.push(result) return } mm := operator.metamethod if result, ok := callMetamethod(a, b, mm, state); ok { state.stack.push(result) return } if operator.floatFunc == nil { panic("number has no integer representation") } var typeName string if _, ok := convertToFloat(a); !ok { typeName = state.TypeName(typeOf(a)) } else { typeName = state.TypeName(typeOf(b)) } panic("attempt to perform arithmetic on a " + typeName + " value") }
逻辑是取出操作数, 然后根据 op 选择 Arith op 然后执行, 最后压入栈.
_arith() 则是具体运算逻辑:
func _arith(a, b luaValue, op operator) luaValue { if op.floatFunc == nil { // bitwise if x, ok := convertToInteger(a); ok { if y, ok := convertToInteger(b); ok { return op.integerFunc(x, y) } } } else { // arith if op.integerFunc != nil { // add,sub,mul,mod,idiv,unm if x, ok := a.(int64); ok { if y, ok := b.(int64); ok { return op.integerFunc(x, y) } } } if x, ok := convertToFloat(a); ok { if y, ok := convertToFloat(b); ok { return op.floatFunc(x, y) } } } return nil }
即按具体操作数决定是否类型转换后进行操作.
Compare() 方法比较栈上两个值, 但不对栈做出修改.
// [-0, +0, e] // http://www.lua.org/manual/5.3/manual.html#lua_compare func (state *luaState) Compare(idx1, idx2 int, op CompareOp) bool { if !state.stack.isValid(idx1) || !state.stack.isValid(idx2) { return false } a := state.stack.get(idx1) b := state.stack.get(idx2) switch op { case LUA_OPEQ: return state.eq(a, b, false) case LUA_OPLT: return state.lt(a, b) case LUA_OPLE: return state.le(a, b) default: panic("invalid compare op!") } }
同样, 具体执行 compare 的 op 需要依据操作码进行选择方法操作:
func (state *luaState) eq(a, b luaValue, raw bool) bool { switch x := a.(type) { case nil: return b == nil case bool: y, ok := b.(bool) return ok && x == y case string: y, ok := b.(string) return ok && x == y case int64: switch y := b.(type) { case int64: return x == y case float64: return float64(x) == y default: return false } case float64: switch y := b.(type) { case float64: return x == y case int64: return x == float64(y) default: return false } case *luaTable: if y, ok := b.(*luaTable); ok && x != y && !raw { if result, ok := callMetamethod(x, y, "__eq", state); ok { return convertToBoolean(result) } } return a == b case *userdata: if y, ok := b.(*userdata); ok { return x.data == y.data } return false default: return a == b } } func (state *luaState) lt(a, b luaValue) bool { switch x := a.(type) { case string: if y, ok := b.(string); ok { return x < y } case int64: switch y := b.(type) { case int64: return x < y case float64: return float64(x) < y } case float64: switch y := b.(type) { case float64: return x < y case int64: return x < float64(y) } } if result, ok := callMetamethod(a, b, "__lt", state); ok { return convertToBoolean(result) } typeName1 := state.TypeName(typeOf(a)) typeName2 := state.TypeName(typeOf(b)) panic("attempt to compare " + typeName1 + " with " + typeName2) } func (state *luaState) le(a, b luaValue) bool { switch x := a.(type) { case string: if y, ok := b.(string); ok { return x <= y } case int64: switch y := b.(type) { case int64: return x <= y case float64: return float64(x) <= y } case float64: switch y := b.(type) { case float64: return x <= y case int64: return x <= float64(y) } } if result, ok := callMetamethod(a, b, "__le", state); ok { return convertToBoolean(result) } if result, ok := callMetamethod(b, a, "__lt", state); ok { return !convertToBoolean(result) } typeName1 := state.TypeName(typeOf(a)) typeName2 := state.TypeName(typeOf(b)) panic("attempt to compare " + typeName1 + " with " + typeName2) }
这路i的对比形式是, 现检测 a 的值类型, 然后将 b 转换到 a 一致的类型, 然后进行比较.
大小比较则只有字符串和数字才有意义.
访问指定位置值, 求长度后将结果压入栈顶.
// [-0, +1, e] // http://www.lua.org/manual/5.3/manual.html#lua_len func (state *luaState) Len(idx int) { val := state.stack.get(idx) if s, ok := val.(string); ok { state.stack.push(int64(len(s))) } else if result, ok := callMetamethod(val, val, "__len", state); ok { state.stack.push(result) } else if t, ok := val.(*luaTable); ok { state.stack.push(int64(t.len())) } else { typeName := state.TypeName(typeOf(val)) panic("attempt to get length of a " + typeName + " value") } }
弹出 n 个值, 然后拼接后压入栈顶.
// [-n, +1, e] // http://www.lua.org/manual/5.3/manual.html#lua_concat func (state *luaState) Concat(n int) { if n == 0 { state.stack.push("") } else if n >= 2 { for i := 1; i < n; i++ { if state.IsString(-1) && state.IsString(-2) { s2 := state.ToString(-1) s1 := state.ToString(-2) state.stack.pop() state.stack.pop() state.stack.push(s1 + s2) continue } b := state.stack.pop() a := state.stack.pop() if result, ok := callMetamethod(a, b, "__concat", state); ok { state.stack.push(result) continue } var typeName string if _, ok := convertToFloat(a); !ok { typeName = state.TypeName(typeOf(a)) } else { typeName = state.TypeName(typeOf(b)) } panic("attempt to concatenate a " + typeName + " value") } } // n == 1, do nothing }
简单描述了 LuaVM 需要 PC 寄存器, 以及构成和执行方式.
LuaVM 接口:
type LuaVM interface { LuaState AddPC(n int) // pc += n Fetch() uint32 // code[pc++] RegisterCount() int // proto.MaxStackSize GetConst(idx int) // push(const[idx]) GetRK(rk int) // rk > 0xFF ? GetConst(rk & 0xFF) : PushValue(rk + 1) LoadProto(idx int) // push(proto[idx] as LuaFunction) LoadVararg(n int) // push(vararg[0], ..., vararg[n-1]) CloseUpvalues(a int) // close all upvalues >= R(A - 1) }
即主要操作核心都是操作 PC 或者修改栈顶.
type luaState struct { /* global state */ hook LuaHook hookMask int panicf GoFunction registry *luaTable /* stack */ stack *luaStack callDepth int /* coroutine */ coStatus ThreadStatus coCaller *luaState coChan chan int }
新增了 proto, 用于保存函数原型, pc 则是 pc 寄存器.
接下来实现刚才定义的 LuaVM 接口:
func (state *luaState) AddPC(n int) { state.stack.pc += n } func (state *luaState) Fetch() uint32 { i := state.stack.closure.proto.Code[state.stack.pc] state.stack.pc++ return i } func (state *luaState) RegisterCount() int { return int(state.stack.closure.proto.MaxStackSize) } func (state *luaState) GetConst(idx int) { c := state.stack.closure.proto.Constants[idx] state.stack.push(c) } func (state *luaState) GetRK(rk int) { if rk > 0xFF { // constant state.GetConst(rk & 0xFF) } else { // register state.PushValue(rk + 1) } }
AddPC() 和 Fetch() 就可以组成向前获取指令的方法.
GetConst() 从常量表取值并压入栈顶.
move (iABC) 把源寄存器 (索引由操作数 B 指定) 里的值移动到目标寄存器 (索引由操作数A指定).
// R(A) := R(B) func move(i Instruction, vm LuaVM) { a, b, _ := i.ABC() a += 1 b += 1 vm.Copy(b, a) }
+1
将寄存器索引转换为栈索引.
Copy 将值拷贝到栈.
// [-0, +0, –] // http://www.lua.org/manual/5.3/manual.html#lua_copy // lua-5.3.4/src/lapi.c#lua_copy() func (state *luaState) Copy(fromIdx, toIdx int) { val := state.stack.get(fromIdx) state.stack.set(toIdx, val) }
由于局部变量在寄存器中, 因此最大局部变量数量不能超过 oprand A 的长度 8, 因此最大能容纳 255 个. 实际上被限制到了 200 个以内.
JMP (iAsBx) 执行无条件跳转.
// pc+=sBx; if (A) close all upvalues >= R(A - 1) func jmp(i Instruction, vm LuaVM) { a, sBx := i.AsBx() vm.AddPC(sBx) if a != 0 { vm.CloseUpvalues(a) } }
func (state *luaState) AddPC(n int) { state.stack.pc += n }
用于把 nil, 布尔, 常量加载到寄存器里.
loadnil (iABC), 用于给连续 n 个进粗气放置 nil 值. 寄存器的起始索引由操作数 A 指定, 寄存器数量 B 指定.
// R(A), R(A+1), ..., R(A+B) := nil func loadNil(i Instruction, vm LuaVM) { a, b, _ := i.ABC() a += 1 //vm.CheckStack(1) vm.PushNil() // ~/nil for i := a; i <= a+b; i++ { vm.Copy(-1, i) } vm.Pop(1) // ~ }
实现方式是先将一个nil值压入栈顶, 然后拷贝覆盖, 最后再弹出压入的多余的值.
loadbool (iABC) 给单个寄存器设置布尔值. 寄存器索引由 A 指定, 布尔值由 B 指定. C 非 0 则跳过下一条指令.
// R(A) := (bool)B; if (C) pc++ func loadBool(i Instruction, vm LuaVM) { a, b, c := i.ABC() a += 1 //vm.CheckStack(1) vm.PushBoolean(b != 0) // ~/b vm.Replace(a) // ~ if c != 0 { vm.AddPC(1) } }
LOADK (iABx) 将常量表里的某个常量加载到指定寄存器. 寄存器索引由操作数 A 指定, 常量表索引由 Bx 指定.
Lua 函数里出现的字面量 (数字和字符串) 会被 Lua 编译器收集起来, 放进常量表里面.
// R(A) := Kst(Bx) func loadK(i Instruction, vm LuaVM) { a, bx := i.ABx() a += 1 //vm.CheckStack(1) vm.GetConst(bx) // ~/k[bx] vm.Replace(a) // ~ }
// R(A) := Kst(extra arg) func loadKx(i Instruction, vm LuaVM) { a, _ := i.ABx() a += 1 ax := Instruction(vm.Fetch()).Ax() //vm.CheckStack(1) vm.GetConst(ax) // ~/k[ax] vm.Replace(a) // ~ }
二元算术运算指令 iABC 模式, 对两个寄存器或常量值进行运算 (Oprand B, C), 将结果放入 Oprand A.
// R(A) := RK(B) op RK(C) func _binaryArith(i Instruction, vm LuaVM, op ArithOp) { a, b, c := i.ABC() a += 1 //vm.CheckStack(2) vm.GetRK(b) // ~/rk[b] vm.GetRK(c) // ~/rk[b]/rk[c] vm.Arith(op) // ~/result vm.Replace(a) // ~ }
// R(A) := op R(B) func _unaryArith(i Instruction, vm LuaVM, op ArithOp) { a, b, _ := i.ABC() a += 1 b += 1 //vm.CheckStack(1) vm.PushValue(b) // ~/r[b] vm.Arith(op) // ~/result vm.Replace(a) // ~ }
其他一元算术运算也均是这种模式.
// R(A) := length of R(B) func length(i Instruction, vm LuaVM) { a, b, _ := i.ABC() a += 1 b += 1 //vm.CheckStack(1) vm.Len(b) // ~/#r[b] vm.Replace(a) // ~ }
// R(A) := R(B).. ... ..R(C) func concat(i Instruction, vm LuaVM) { a, b, c := i.ABC() a += 1 b += 1 c += 1 n := c - b + 1 vm.CheckStack(n) for i := b; i <= c; i++ { vm.PushValue(i) // ~/r[b]/.../r[c] } vm.Concat(n) // ~/result vm.Replace(a) // ~ }
这里的 B, C 为起止索引位置. 最终结果放入 A. 计算出的 n 即为拼接的栈长度. 检查 Stack 后, 调用 vm.Concat() 进行拼接. 另外还需要检查栈空间是否够用.
iABC 模式. 可以比较寄存器或常量表里的两个值. 如果比较结果和 Oprand A (隐式转换为布尔值) 不匹配, 则跳过下一条指令. 即
if ((RK(B) op RK(C)) ~= A ) then pc++
// if ((RK(B) op RK(C)) ~= A) then pc++ func _compare(i Instruction, vm LuaVM, op CompareOp) { a, b, c := i.ABC() //vm.CheckStack(2) vm.GetRK(b) // ~/rk[b] vm.GetRK(c) // ~/rk[b]/rk[c] if vm.Compare(-2, -1, op) != (a != 0) { vm.AddPC(1) } vm.Pop(2) // ~ }
iABC 模式.
// R(A) := not R(B) func not(i Instruction, vm LuaVM) { a, b, _ := i.ABC() a += 1 b += 1 //vm.CheckStack(1) vm.PushBoolean(!vm.ToBoolean(b)) // ~/!r[b] vm.Replace(a) // ~ }
// if (R(B) <=> C) then R(A) := R(B) else pc++ func testSet(i Instruction, vm LuaVM) { a, b, c := i.ABC() a += 1 b += 1 if vm.ToBoolean(b) == (c != 0) { vm.Copy(b, a) } else { vm.AddPC(1) } }
// if not (R(A) <=> C) then pc++ func test(i Instruction, vm LuaVM) { a, _, c := i.ABC() a += 1 if vm.ToBoolean(a) != (c != 0) { vm.AddPC(1) } }
for 循环有 数值 和 统用 两种形式.数值 for 用于按一定步长遍历某个范围内的数值. 通用 for 则用于便利表.
// R(A)-=R(A+2); pc+=sBx func forPrep(i Instruction, vm LuaVM) { a, sBx := i.AsBx() a += 1 //vm.CheckStack(2) if vm.Type(a) == LUA_TSTRING { vm.PushNumber(vm.ToNumber(a)) vm.Replace(a) } if vm.Type(a+1) == LUA_TSTRING { vm.PushNumber(vm.ToNumber(a + 1)) vm.Replace(a + 1) } if vm.Type(a+2) == LUA_TSTRING { vm.PushNumber(vm.ToNumber(a + 2)) vm.Replace(a + 2) } vm.PushValue(a) // ~/r[a] vm.PushValue(a + 2) // ~/r[a]/r[a+2] vm.Arith(LUA_OPSUB) // ~/r[a]-r[a+2] vm.Replace(a) // ~ vm.AddPC(sBx) } // R(A)+=R(A+2); // if R(A) <?= R(A+1) then { // pc+=sBx; R(A+3)=R(A) // } func forLoop(i Instruction, vm LuaVM) { a, sBx := i.AsBx() a += 1 //vm.CheckStack(2) // R(A)+=R(A+2); vm.PushValue(a + 2) // ~/r[a+2] vm.PushValue(a) // ~/r[a+2]/r[a] vm.Arith(LUA_OPADD) // ~/r[a]+r[a+2] vm.Replace(a) // ~ isPositiveStep := vm.ToNumber(a+2) >= 0 if isPositiveStep && vm.Compare(a, a+1, LUA_OPLE) || !isPositiveStep && vm.Compare(a+1, a, LUA_OPLE) { // pc+=sBx; R(A+3)=R(A) vm.AddPC(sBx) vm.Copy(a, a+3) } }
FORPREP :
R(A) -= R(A+2);
pc += sBx
FORLOOP:
R(A)+=R(A+2);
if R(A) <?= R(A+1) then {
pc+=sBx; R(A+3)=R(A)
}
func (instr Instruction) Execute(vm api.LuaVM) { opcodes[instr.Opcode()].action(instr, vm) }
Lua 表本质上是关联数组, Associative Array, Dictionary, Map. 存放的是两两关联的键值对. 表有 Record 和 List (数字下标) 之分.
Lua 优化了作为数组和Map时的具体表现.
type luaTable struct { metatable *luaTable arr []luaValue _map map[luaValue]luaValue keys map[luaValue]luaValue // used by next() lastKey luaValue // used by next() changed bool // used by next() }
arr 存放数组, _map 存放 Map. 然后是初始化函数.
func newLuaTable(nArr, nRec int) *luaTable { t := &luaTable{} if nArr > 0 { t.arr = make([]luaValue, 0, nArr) } if nRec > 0 { t._map = make(map[luaValue]luaValue, nRec) } return t }
参数用于预估长度.
func (table *luaTable) get(key luaValue) luaValue { key = _floatToInteger(key) if idx, ok := key.(int64); ok { if idx >= 1 && idx <= int64(len(table.arr)) { return table.arr[idx-1] } } return table._map[key] }
先转换成整数然后查看是否在范围内, 然后进行查找值.
func _floatToInteger(key luaValue) luaValue { if f, ok := key.(float64); ok { if i, ok := number.FloatToInteger(f); ok { return i } } return key }
func (table *luaTable) put(key, val luaValue) { if key == nil { panic("table index is nil!") } if f, ok := key.(float64); ok && math.IsNaN(f) { panic("table index is NaN!") } table.changed = true key = _floatToInteger(key) if idx, ok := key.(int64); ok && idx >= 1 { arrLen := int64(len(table.arr)) if idx <= arrLen { table.arr[idx-1] = val if idx == arrLen && val == nil { table._shrinkArray() } return } if idx == arrLen+1 { delete(table._map, key) if val != nil { table.arr = append(table.arr, val) table._expandArray() } return } } if val != nil { if table._map == nil { table._map = make(map[luaValue]luaValue, 8) } table._map[key] = val } else { delete(table._map, key) } }
这里涉及了 Lua 表的特性.
函数调用有一些特性:
同样, 为了处理调用关系, 用了栈的结构对调用过程进行抽象.
type luaStack struct { /* virtual stack */ slots []luaValue top int /* call info */ state *luaState closure *closure varargs []luaValue openuvs map[int]*upvalue pc int /* linked list */ prev *luaStack }
新增了 prev, closure, varargs, pc.
然后定义 closure 结构体.
type closure struct { proto *binchunk.Prototype // lua closure goFunc GoFunction // go closure upvals []*upvalue }
proto 用于存放函数原型.
调用栈采用链表实现:
func (state *luaState) pushLuaStack(stack *luaStack) { stack.prev = state.stack state.stack = stack state.callDepth++ } func (state *luaState) popLuaStack() { stack := state.stack state.stack = stack.prev stack.prev = nil state.callDepth-- }
load() 方法加载二进制 chunk, 把主函数原型实例化为闭包并推入栈顶. 该方法同样可以加载 Lua 脚本.
解析流程为读取文件, 解析主函数原型, 实例化为闭包, 推入栈顶.
// [-0, +1, –] // http://www.lua.org/manual/5.3/manual.html#lua_load func (state *luaState) Load(chunk []byte, chunkName, mode string) (status ThreadStatus) { status = LUA_ERRSYNTAX // catch error defer func() { if r := recover(); r != nil { state.stack.push(_getErrObj(r)) } }() var proto *binchunk.Prototype if binchunk.IsBinaryChunk(chunk) { if mode == "t" { panic("attempt to load a binary chunk (mode is '" + mode + "')") } proto = binchunk.Undump(chunk) } else { if mode == "b" { panic("attempt to load a text chunk (mode is '" + mode + "')") } proto = compiler.Compile(string(chunk), chunkName) } c := newLuaClosure(proto) if len(proto.Upvalues) > 0 { env := state.registry.get(LUA_RIDX_GLOBALS) c.upvals[0] = &upvalue{&env} } state.stack.push(c) status = LUA_OK return }
将被调用函数推入栈顶, 接着推入函数参数, 即可调用 call(). 执行完毕后这些被弹出, 并将结果压入.
// [-(nargs+1), +nresults, e] // http://www.lua.org/manual/5.3/manual.html#lua_call func (state *luaState) Call(nArgs, nResults int) { val := state.stack.get(-(nArgs + 1)) c, ok := val.(*closure) if !ok { if mf := getMetafield(val, "__call", state); mf != nil { if c, ok = mf.(*closure); ok { state.stack.push(val) state.Insert(-(nArgs + 2)) nArgs += 1 } } } if ok { if c.proto != nil { state.callLuaClosure(nArgs, nResults, c) } else { state.callGoClosure(nArgs, nResults, c) } } else { typeName := state.TypeName(typeOf(val)) panic("attempt to call a " + typeName + " value") } }
state.callLuaClosure() 则充当了函数的实际调用者.
func (state *luaState) callLuaClosure(nArgs, nResults int, c *closure) { nRegs := int(c.proto.MaxStackSize) nParams := int(c.proto.NumParams) isVararg := c.proto.IsVararg == 1 // create new lua stack newStack := newLuaStack(nRegs+LUA_MINSTACK, state) newStack.closure = c // pass args, pop func funcAndArgs := state.stack.popN(nArgs + 1) newStack.pushN(funcAndArgs[1:], nParams) newStack.top = nRegs if nArgs > nParams && isVararg { newStack.varargs = funcAndArgs[nParams+1:] } // run closure state.pushLuaStack(newStack) state.runLuaClosure() state.popLuaStack() // return results if nResults != 0 { results := newStack.popN(newStack.top - nRegs) state.stack.check(len(results)) state.stack.pushN(results, nResults) } }
调用过程为创建新的 stack, 传入参数, 弹出目标函数, 运行 closure, 最后将结果放入 stack.
state.runLuaClosure() 为内部执行具体函数.
func (state *luaState) runLuaClosure() { for { inst := vm.Instruction(state.Fetch()) inst.Execute(state) // indent := fmt.Sprintf("%%%ds", state.callDepth*2) // fmt.Printf(indent+"[%02d: %s] => %s\n", // "", pc+1, inst.OpName(), state) if inst.Opcode() == vm.OP_RETURN { break } } }
这里的 state.Fetch() 根据 PC 获取当前指令, 然后 inst.Execute() 执行指令.
closure 指令, iBx 模式, 把当前函数的子函数原型实例化为闭包, 放入由操作数 A 指定的寄存器中.
// R(A) := closure(KPROTO[Bx]) func closure(i Instruction, vm LuaVM) { a, bx := i.ABx() a += 1 //vm.CheckStack(1) vm.LoadProto(bx) // ~/closure vm.Replace(a) // ~ }
call 指令, iABC 模式, 调用 Lua 函数.
// R(A), ... ,R(A+C-2) := R(A)(R(A+1), ... ,R(A+B-1)) func call(i Instruction, vm LuaVM) { a, b, c := i.ABC() a += 1 // println(":::"+ vm.StackToString()) nArgs := _pushFuncAndArgs(a, b, vm) vm.Call(nArgs, c-1) _popResults(a, c, vm) }
return 指令, iABC 模式, 把存放在连续多个寄存器里的返回值返回给主调函数.
// return R(A), ... ,R(A+B-2) func _return(i Instruction, vm LuaVM) { a, b, _ := i.ABC() a += 1 if b == 1 { // no return values } else if b > 1 { // b-1 return values vm.CheckStack(b - 1) for i := a; i <= a+b-2; i++ { vm.PushValue(i) } } else { _fixStack(a, vm) } }
其中第一个寄存器索引由操作数 A 指定, 寄存器数量由操作数 B 指定.
vararg 指令, iABC 模式, 把传递给当前函数的变长参数加载到连续多个寄存器中. 操作数 A 指定寄存器索引, 操作数 B 指定寄存器数量.
// R(A), R(A+1), ..., R(A+B-2) = vararg func vararg(i Instruction, vm LuaVM) { a, b, _ := i.ABC() a += 1 if b != 1 { // b==0 or b>1 vm.LoadVararg(b - 1) _popResults(a, b, vm) } }
return f(args) 会被优化成尾递归, 即 tailcall 指令.
// return R(A)(R(A+1), ... ,R(A+B-1)) func tailCall(i Instruction, vm LuaVM) { a, b, _ := i.ABC() a += 1 // todo: optimize tail call! c := 0 nArgs := _pushFuncAndArgs(a, b, vm) vm.Call(nArgs, c-1) _popResults(a, c, vm) }
self 指令, iABC 模式, 把对向和方法拷贝到相邻的两个目标寄存器中. 对象在寄存器中索引由操作数 B 指定. 方法名在常量表, 索引由操作数 C 指定. 目标寄存器索引由 A 指定.
// R(A+1) := R(B); R(A) := R(B)[RK(C)] func self(i Instruction, vm LuaVM) { a, b, c := i.ABC() a += 1 b += 1 vm.Copy(b, a+1) //vm.CheckStack(1) vm.GetRK(c) // ~/rk[c] vm.GetTable(b) // ~/r[b][rk[c]] vm.Replace(a) // ~ }
Lexer 结构体:
type Lexer struct { chunk string // source code chunkName string // source name line int // current line number nextToken string nextTokenKind int nextTokenLine int }
定义 NextToken():
func (lexer *Lexer) NextToken() (line, kind int, token string) { if lexer.nextTokenLine > 0 { line = lexer.nextTokenLine kind = lexer.nextTokenKind token = lexer.nextToken lexer.line = lexer.nextTokenLine lexer.nextTokenLine = 0 return } lexer.skipWhiteSpaces() if len(lexer.chunk) == 0 { return lexer.line, TOKEN_EOF, "EOF" } switch lexer.chunk[0] { case ';': lexer.next(1) return lexer.line, TOKEN_SEP_SEMI, ";" case ',': lexer.next(1) return lexer.line, TOKEN_SEP_COMMA, "," case '(': lexer.next(1) return lexer.line, TOKEN_SEP_LPAREN, "(" case ')': lexer.next(1) return lexer.line, TOKEN_SEP_RPAREN, ")" case ']': lexer.next(1) return lexer.line, TOKEN_SEP_RBRACK, "]" case '{': lexer.next(1) return lexer.line, TOKEN_SEP_LCURLY, "{" case '}': lexer.next(1) return lexer.line, TOKEN_SEP_RCURLY, "}" case '+': lexer.next(1) return lexer.line, TOKEN_OP_ADD, "+" case '-': lexer.next(1) return lexer.line, TOKEN_OP_MINUS, "-" case '*': lexer.next(1) return lexer.line, TOKEN_OP_MUL, "*" case '^': lexer.next(1) return lexer.line, TOKEN_OP_POW, "^" case '%': lexer.next(1) return lexer.line, TOKEN_OP_MOD, "%" case '&': lexer.next(1) return lexer.line, TOKEN_OP_BAND, "&" case '|': lexer.next(1) return lexer.line, TOKEN_OP_BOR, "|" case '#': lexer.next(1) return lexer.line, TOKEN_OP_LEN, "#" case ':': if lexer.test("::") { lexer.next(2) return lexer.line, TOKEN_SEP_LABEL, "::" } else { lexer.next(1) return lexer.line, TOKEN_SEP_COLON, ":" } case '/': if lexer.test("//") { lexer.next(2) return lexer.line, TOKEN_OP_IDIV, "//" } else { lexer.next(1) return lexer.line, TOKEN_OP_DIV, "/" } case '~': if lexer.test("~=") { lexer.next(2) return lexer.line, TOKEN_OP_NE, "~=" } else { lexer.next(1) return lexer.line, TOKEN_OP_WAVE, "~" } case '=': if lexer.test("==") { lexer.next(2) return lexer.line, TOKEN_OP_EQ, "==" } else { lexer.next(1) return lexer.line, TOKEN_OP_ASSIGN, "=" } case '<': if lexer.test("<<") { lexer.next(2) return lexer.line, TOKEN_OP_SHL, "<<" } else if lexer.test("<=") { lexer.next(2) return lexer.line, TOKEN_OP_LE, "<=" } else { lexer.next(1) return lexer.line, TOKEN_OP_LT, "<" } case '>': if lexer.test(">>") { lexer.next(2) return lexer.line, TOKEN_OP_SHR, ">>" } else if lexer.test(">=") { lexer.next(2) return lexer.line, TOKEN_OP_GE, ">=" } else { lexer.next(1) return lexer.line, TOKEN_OP_GT, ">" } case '.': if lexer.test("...") { lexer.next(3) return lexer.line, TOKEN_VARARG, "..." } else if lexer.test("..") { lexer.next(2) return lexer.line, TOKEN_OP_CONCAT, ".." } else if len(lexer.chunk) == 1 || !isDigit(lexer.chunk[1]) { lexer.next(1) return lexer.line, TOKEN_SEP_DOT, "." } case '[': if lexer.test("[[") || lexer.test("[=") { return lexer.line, TOKEN_STRING, lexer.scanLongString() } else { lexer.next(1) return lexer.line, TOKEN_SEP_LBRACK, "[" } case '\'', '"': return lexer.line, TOKEN_STRING, lexer.scanShortString() } c := lexer.chunk[0] if c == '.' || isDigit(c) { token := lexer.scanNumber() return lexer.line, TOKEN_NUMBER, token } if c == '_' || isLetter(c) { token := lexer.scanIdentifier() if kind, found := keywords[token]; found { return lexer.line, kind, token // keyword } else { return lexer.line, TOKEN_IDENTIFIER, token } } lexer.error("unexpected symbol near %q", c) return }
lexer.next(n) 用于跳过指定个数的字符. 检测到是指定的字符或字符序列后返回 token 类型.
func (lexer *Lexer) test(s string) bool { return strings.HasPrefix(lexer.chunk, s) }
func (lexer *Lexer) skipWhiteSpaces() { for len(lexer.chunk) > 0 { if lexer.test("--") { lexer.skipComment() } else if lexer.test("\r\n") || lexer.test("\n\r") { lexer.next(2) lexer.line += 1 } else if isNewLine(lexer.chunk[0]) { lexer.next(1) lexer.line += 1 } else if isWhiteSpace(lexer.chunk[0]) { lexer.next(1) } else { break } } }
跳过空白和注释.
func (lexer *Lexer) skipComment() { lexer.next(2) // skip -- // long comment ? if lexer.test("[") { if reOpeningLongBracket.FindString(lexer.chunk) != "" { lexer.scanLongString() return } } // short comment for len(lexer.chunk) > 0 && !isNewLine(lexer.chunk[0]) { lexer.next(1) } }
func (lexer *Lexer) scanLongString() string { openingLongBracket := reOpeningLongBracket.FindString(lexer.chunk) if openingLongBracket == "" { lexer.error("invalid long string delimiter near '%s'", lexer.chunk[0:2]) } closingLongBracket := strings.Replace(openingLongBracket, "[", "]", -1) closingLongBracketIdx := strings.Index(lexer.chunk, closingLongBracket) if closingLongBracketIdx < 0 { lexer.error("unfinished long string or comment") } str := lexer.chunk[len(openingLongBracket):closingLongBracketIdx] lexer.next(closingLongBracketIdx + len(closingLongBracket)) str = reNewLine.ReplaceAllString(str, "\n") lexer.line += strings.Count(str, "\n") if len(str) > 0 && str[0] == '\n' { str = str[1:] } return str }
即可以换行的字符串字面量的提取.
var reShortStr = regexp.MustCompile(`(?s)(^'(\\\\|\\'|\\\n|\\z\s*|[^'\n])*')|(^"(\\\\|\\"|\\\n|\\z\s*|[^"\n])*")`) func (lexer *Lexer) scanShortString() string { if str := reShortStr.FindString(lexer.chunk); str != "" { lexer.next(len(str)) str = str[1 : len(str)-1] if strings.Index(str, `\`) >= 0 { lexer.line += len(reNewLine.FindAllString(str, -1)) str = lexer.escape(str) } return str } lexer.error("unfinished string") return "" }
用正则匹配短字符串字面量.
func (lexer *Lexer) escape(str string) string { var buf bytes.Buffer for len(str) > 0 { if str[0] != '\\' { buf.WriteByte(str[0]) str = str[1:] continue } if len(str) == 1 { lexer.error("unfinished string") } switch str[1] { case 'a': buf.WriteByte('\a') str = str[2:] continue case 'b': buf.WriteByte('\b') str = str[2:] continue case 'f': buf.WriteByte('\f') str = str[2:] continue case 'n', '\n': buf.WriteByte('\n') str = str[2:] continue case 'r': buf.WriteByte('\r') str = str[2:] continue case 't': buf.WriteByte('\t') str = str[2:] continue case 'v': buf.WriteByte('\v') str = str[2:] continue case '"': buf.WriteByte('"') str = str[2:] continue case '\'': buf.WriteByte('\'') str = str[2:] continue case '\\': buf.WriteByte('\\') str = str[2:] continue case '0', '1', '2', '3', '4', '5', '6', '7', '8', '9': // \ddd if found := reDecEscapeSeq.FindString(str); found != "" { d, _ := strconv.ParseInt(found[1:], 10, 32) if d <= 0xFF { buf.WriteByte(byte(d)) str = str[len(found):] continue } lexer.error("decimal escape too large near '%s'", found) } case 'x': // \xXX if found := reHexEscapeSeq.FindString(str); found != "" { d, _ := strconv.ParseInt(found[2:], 16, 32) buf.WriteByte(byte(d)) str = str[len(found):] continue } case 'u': // \u{XXX} if found := reUnicodeEscapeSeq.FindString(str); found != "" { d, err := strconv.ParseInt(found[3:len(found)-1], 16, 32) if err == nil && d <= 0x10FFFF { buf.WriteRune(rune(d)) str = str[len(found):] continue } lexer.error("UTF-8 value too large near '%s'", found) } case 'z': str = str[2:] for len(str) > 0 && isWhiteSpace(str[0]) { // todo str = str[1:] } continue } lexer.error("invalid escape sequence near '\\%c'", str[1]) } return buf.String() }
同样, 用正则表达式完成.
不跳过并判断下一个 token 是什么类型.
func (lexer *Lexer) LookAhead() int { if lexer.nextTokenLine > 0 { return lexer.nextTokenKind } currentLine := lexer.line line, kind, token := lexer.NextToken() lexer.line = currentLine lexer.nextTokenLine = line lexer.nextTokenKind = kind lexer.nextToken = token return kind }
通常用 EBNF (扩展巴科斯范式, Extended Backus-Naur Form) 描述.
chunk ::= block
block ::= {stat} [retstat]
retstat ::= return [explist] [';']
explist ::= exp {',' exp}
{A} 表示 0 或多次, [A] 表示 0 或 1 次.
Block 结构体:
package ast // chunk ::= block // type Chunk *Block // block ::= {stat} [retstat] // retstat ::= return [explist] [‘;’] // explist ::= exp {‘,’ exp} type Block struct { LastLine int Stats []Stat RetExps []Exp }
在命令式编程语言里, 语句 (statement) 的最基本的执行单位, 表达式 (expression) 则是构成语句的要素之一. 语句只能执行不能用于求值, 表达式只能用于求值不能单独执行.
stat ::= ';'
| varlist '=' explist
| functioncall
| label
| break
| goto Name
| do block end
| while exp do block end
| repeat block until end
| if exp then block {elseif exp then block} [else block] end
| for Name '=' exp ',' exp [',' exp] do block end
| for namelist in explist do block end
| function funcname funcbody
| local function Name funcbody
| local namelist ['=' explist]
type EmptyStat struct{} // ‘;’ type BreakStat struct{ Line int } // break type DoStat struct{ Block *Block } // do block end type FuncCallStat = FuncCallExp // functioncall
while exp do block end
repeat block until exp
// while exp do block end type WhileStat struct { Exp Exp Block *Block } // repeat block until exp type RepeatStat struct { Block *Block Exp Exp }
本是上都是包含一个 exp 和一个 block.
if exp then block {elseif exp then block} [else block] end
为了简化需要改写成:
if exp then block {elseif exp then block} [elseif true then block] end
如果把 elseif 块合并, EBNF 可以简化为
if exp then block {elseif exp then block} end
这个转化会在语法分析阶段进行.
// if exp then block {elseif exp then block} [else block] end type IfStat struct { Exps []Exp Blocks []*Block }
for Name '=' exp ',' exp [',' exp] do block end
// for Name ‘=’ exp ‘,’ exp [‘,’ exp] do block end type ForNumStat struct { LineOfFor int LineOfDo int VarName string InitExp Exp LimitExp Exp StepExp Exp Block *Block }
for namelist in explist do block end
namelist ::= Name {',' Name}
explist ::= exp {',' exp}
// for namelist in explist do block end // namelist ::= Name {‘,’ Name} // explist ::= exp {‘,’ exp} type ForInStat struct { LineOfDo int NameList []string ExpList []Exp Block *Block }
记录行号基本都是为了代码生成.
local namelist ['=' explist]
namelist ::= Name {',' Name}
explist ::= exp {',' exp}
// local namelist [‘=’ explist] // namelist ::= Name {‘,’ Name} // explist ::= exp {‘,’ exp} type LocalVarDeclStat struct { LastLine int NameList []string ExpList []Exp }
varlist ‘=’ explist
varlist ::= var {‘,’ var}
var ::= Name | prefixexp ‘[’ exp ‘]’ | prefixexp ‘.’ Name
// varlist ‘=’ explist // varlist ::= var {‘,’ var} // var ::= Name | prefixexp ‘[’ exp ‘]’ | prefixexp ‘.’ Name type AssignStat struct { LastLine int VarList []Exp ExpList []Exp }
function ::= function funcbody
funcname ::= Name {`.´ Name} [`:´ Name]
funcbody ::= `(´ [parlist] `)´ block end
local function Name funcbody
local function f (params) body end
会被转化为 local f; f = function (params) body end
// local function Name funcbody type LocalFuncDefStat struct { Name string Exp *FuncDefExp }
exp ::= nil | false | true | Number | String | `...´ | function | prefixexp | tableconstructor | exp binop exp | unop exp
type NilExp struct{ Line int } // nil type TrueExp struct{ Line int } // true type FalseExp struct{ Line int } // false type VarargExp struct{ Line int } // ... // Numeral type IntegerExp struct { Line int Val int64 } type FloatExp struct { Line int Val float64 } // LiteralString type StringExp struct { Line int Str string }
基本都包含行号, 有值的包含值.
// unop exp type UnopExp struct { Line int // line of operator Op int // operator Exp Exp } // exp1 op exp2 type BinopExp struct { Line int // line of operator Op int // operator Exp1 Exp Exp2 Exp } type ConcatExp struct { Line int // line of last .. Exps []Exp }
// tableconstructor ::= ‘{’ [fieldlist] ‘}’ // fieldlist ::= field {fieldsep field} [fieldsep] // field ::= ‘[’ exp ‘]’ ‘=’ exp | Name ‘=’ exp | exp // fieldsep ::= ‘,’ | ‘;’ type TableConstructorExp struct { Line int // line of `{` ? LastLine int // line of `}` KeyExps []Exp ValExps []Exp }
// functiondef ::= function funcbody // funcbody ::= ‘(’ [parlist] ‘)’ block end // parlist ::= namelist [‘,’ ‘...’] | ‘...’ // namelist ::= Name {‘,’ Name} type FuncDefExp struct { Line int LastLine int // line of `end` ParList []string IsVararg bool Block *Block }
/*
prefixexp ::= Name |
‘(’ exp ‘)’ |
prefixexp ‘[’ exp ‘]’ |
prefixexp ‘.’ Name |
prefixexp ‘:’ Name args |
prefixexp args
*/
type ParensExp struct { Exp Exp }
type TableAccessExp struct { LastLine int // line of `]` ? PrefixExp Exp KeyExp Exp }
functioncall ::= prefixexp args | prefixexp `:´ Name args
args ::= `(´ [explist] `)´ | tableconstructor | String
type FuncCallExp struct { Line int // line of `(` ? LastLine int // line of ')' PrefixExp Exp NameExp *StringExp Args []Exp }
这里是 lua 语法的 EBNF 汇总:
chunk = {stat [`;´]} [laststat [`;´]].
block ::= chunk
stat ::= varlist `=´ explist |
functioncall |
do block end |
while exp do block end |
repeat block until exp |
if exp then block {elseif exp then block} [else block] end |
for Name `=´ exp `,´ exp [`,´ exp] do block end |
for namelist in explist do block end |
function funcname funcbody |
local function Name funcbody |
local namelist [`=´ explist]
laststat ::= return [explist] | break
funcname ::= Name {`.´ Name} [`:´ Name]
varlist ::= var {`,´ var}
var ::= Name | prefixexp `[´ exp `]´ | prefixexp `.´ Name
namelist ::= Name {`,´ Name}
explist ::= {exp `,´} exp
exp ::= nil | false | true | Number | String | `...´ | function | prefixexp | tableconstructor | exp binop exp | unop exp
prefixexp ::= var | functioncall | `(´ exp `)´
functioncall ::= prefixexp args | prefixexp `:´ Name args
args ::= `(´ [explist] `)´ | tableconstructor | String
function ::= function funcbody
funcbody ::= `(´ [parlist] `)´ block end
parlist ::= namelist [`,´ `...´] | `...´
tableconstructor ::= `{´ [fieldlist] `}´
fieldlist ::= field {fieldsep field} [fieldsep]
field ::= `[´ exp `]´ `=´ exp | Name `=´ exp | exp
fieldsep ::= `,´ | `;´
binop ::= `+´ | `-´ | `*´ | `/´ | `^´ | `%´ | `..´ | `<´ | `<=´ | `>´ | `>=´ | `==´ | `~=´ | and | or
unop ::= `-´ | not | `#´
原始文档在: http://www.lua.org/manual/5.1/manual.html#8
讲了 C 语言 EBNF 描述中 if else 嵌套情况下的歧义问题. C 语言是通过定义 else 与最近的 if 结合来避免这个问题的.
Lua 二元运算符则是通过运算优先级解决歧义的.
如果上下文无关语言 L 不需要借助回溯就可以完成解析, 那么成为确定性 (Deterministic) 语言. 确定性语言一定没有歧义. 上下文无关包含无歧义包含确定性.
使用自顶向下 (Top-Down) 方法与使用自底向上 (Bottom-Up) 都可以构造语法树. 自底向上包括 LR 解析器和 CYK 解析器等. 自顶向下解析器包括 LL 解析器和***递归下降 (Recursive Descent Parser) 解析器***
Lua 脚本实际上就是个代码块, 解析结果应该是一个 Block 结构体实例.
// block ::= {stat} [retstat] func parseBlock(lexer *Lexer) *Block { return &Block{ Stats: parseStats(lexer), RetExps: parseRetExps(lexer), LastLine: lexer.Line(), } }
parseStats() 解析语句序列, parseRetExps() 解析可选的返回语句, 并记录末尾行号.
func parseStats(lexer *Lexer) []Stat { stats := make([]Stat, 0, 8) for !_isReturnOrBlockEnd(lexer.LookAhead()) { stat := parseStat(lexer) if _, ok := stat.(*EmptyStat); !ok { stats = append(stats, stat) } } return stats } func _isReturnOrBlockEnd(tokenKind int) bool { switch tokenKind { case TOKEN_KW_RETURN, TOKEN_EOF, TOKEN_KW_END, TOKEN_KW_ELSE, TOKEN_KW_ELSEIF, TOKEN_KW_UNTIL: return true } return false }
_isReturnOrBlockEnd 确定了 block 的结束.
// retstat ::= return [explist] [‘;’] // explist ::= exp {‘,’ exp} func parseRetExps(lexer *Lexer) []Exp { if lexer.LookAhead() != TOKEN_KW_RETURN { return nil } lexer.NextToken() switch lexer.LookAhead() { case TOKEN_EOF, TOKEN_KW_END, TOKEN_KW_ELSE, TOKEN_KW_ELSEIF, TOKEN_KW_UNTIL: return []Exp{} case TOKEN_SEP_SEMI: lexer.NextToken() return []Exp{} default: exps := parseExpList(lexer) if lexer.LookAhead() == TOKEN_SEP_SEMI { lexer.NextToken() } return exps } }
接下来是表达式的 parser
// explist ::= exp {‘,’ exp} func parseExpList(lexer *Lexer) []Exp { exps := make([]Exp, 0, 4) exps = append(exps, parseExp(lexer)) for lexer.LookAhead() == TOKEN_SEP_COMMA { lexer.NextToken() exps = append(exps, parseExp(lexer)) } return exps }
stat ::= varlist `=´ explist |
functioncall |
do block end |
while exp do block end |
repeat block until exp |
if exp then block {elseif exp then block} [else block] end |
for Name `=´ exp `,´ exp [`,´ exp] do block end |
for namelist in explist do block end |
function funcname funcbody |
local function Name funcbody |
local namelist [`=´ explist]
这里可以看到需要大量前瞻.
/* stat ::= ‘;’ | break | ‘::’ Name ‘::’ | goto Name | do block end | while exp do block end | repeat block until exp | if exp then block {elseif exp then block} [else block] end | for Name ‘=’ exp ‘,’ exp [‘,’ exp] do block end | for namelist in explist do block end | function funcname funcbody | local function Name funcbody | local namelist [‘=’ explist] | varlist ‘=’ explist | functioncall */ func parseStat(lexer *Lexer) Stat { switch lexer.LookAhead() { case TOKEN_SEP_SEMI: return parseEmptyStat(lexer) case TOKEN_KW_BREAK: return parseBreakStat(lexer) case TOKEN_SEP_LABEL: return parseLabelStat(lexer) case TOKEN_KW_GOTO: return parseGotoStat(lexer) case TOKEN_KW_DO: return parseDoStat(lexer) case TOKEN_KW_WHILE: return parseWhileStat(lexer) case TOKEN_KW_REPEAT: return parseRepeatStat(lexer) case TOKEN_KW_IF: return parseIfStat(lexer) case TOKEN_KW_FOR: return parseForStat(lexer) case TOKEN_KW_FUNCTION: return parseFuncDefStat(lexer) case TOKEN_KW_LOCAL: return parseLocalAssignOrFuncDefStat(lexer) default: return parseAssignOrFuncCallStat(lexer) } }
即进行一次前瞻, 然后尝试进行匹配.
包括 空语句, break, label goto, do, while, repeat.
// ; func parseEmptyStat(lexer *Lexer) *EmptyStat { lexer.NextTokenOfKind(TOKEN_SEP_SEMI) return _statEmpty } // break func parseBreakStat(lexer *Lexer) *BreakStat { lexer.NextTokenOfKind(TOKEN_KW_BREAK) return &BreakStat{lexer.Line()} } // ‘::’ Name ‘::’ func parseLabelStat(lexer *Lexer) *LabelStat { lexer.NextTokenOfKind(TOKEN_SEP_LABEL) // :: line, name := lexer.NextIdentifier() // name lexer.NextTokenOfKind(TOKEN_SEP_LABEL) // :: return &LabelStat{line, name} } // goto Name func parseGotoStat(lexer *Lexer) *GotoStat { lexer.NextTokenOfKind(TOKEN_KW_GOTO) // goto line, name := lexer.NextIdentifier() // name return &GotoStat{line, name} } // do block end func parseDoStat(lexer *Lexer) *DoStat { lexer.NextTokenOfKind(TOKEN_KW_DO) // do block := parseBlock(lexer) // block lexer.NextTokenOfKind(TOKEN_KW_END) // end return &DoStat{block} } // while exp do block end func parseWhileStat(lexer *Lexer) *WhileStat { lexer.NextTokenOfKind(TOKEN_KW_WHILE) // while exp := parseExp(lexer) // exp lexer.NextTokenOfKind(TOKEN_KW_DO) // do block := parseBlock(lexer) // block lexer.NextTokenOfKind(TOKEN_KW_END) // end return &WhileStat{exp, block} } // repeat block until exp func parseRepeatStat(lexer *Lexer) *RepeatStat { lexer.NextTokenOfKind(TOKEN_KW_REPEAT) // repeat block := parseBlock(lexer) // block lexer.NextTokenOfKind(TOKEN_KW_UNTIL) // until exp := parseExp(lexer) // exp return &RepeatStat{block, exp} }
具体处理逻辑.
这里的 parseBlock() 与 parseDoStat() 互相调用, 因此被称作递归下降.
// if exp then block {elseif exp then block} [else block] end type IfStat struct { Exps []Exp Blocks []*Block } // if exp then block {elseif exp then block} [else block] end func parseIfStat(lexer *Lexer) *IfStat { exps := make([]Exp, 0, 4) blocks := make([]*Block, 0, 4) lexer.NextTokenOfKind(TOKEN_KW_IF) // if exps = append(exps, parseExp(lexer)) // exp lexer.NextTokenOfKind(TOKEN_KW_THEN) // then blocks = append(blocks, parseBlock(lexer)) // block for lexer.LookAhead() == TOKEN_KW_ELSEIF { lexer.NextToken() // elseif exps = append(exps, parseExp(lexer)) // exp lexer.NextTokenOfKind(TOKEN_KW_THEN) // then blocks = append(blocks, parseBlock(lexer)) // block } // else block => elseif true then block if lexer.LookAhead() == TOKEN_KW_ELSE { lexer.NextToken() // else exps = append(exps, &TrueExp{lexer.Line()}) // blocks = append(blocks, parseBlock(lexer)) // block } lexer.NextTokenOfKind(TOKEN_KW_END) // end return &IfStat{exps, blocks} }
源码写的很流畅.
// for Name ‘=’ exp ‘,’ exp [‘,’ exp] do block end // for namelist in explist do block end func parseForStat(lexer *Lexer) Stat { lineOfFor, _ := lexer.NextTokenOfKind(TOKEN_KW_FOR) _, name := lexer.NextIdentifier() if lexer.LookAhead() == TOKEN_OP_ASSIGN { return _finishForNumStat(lexer, lineOfFor, name) } else { return _finishForInStat(lexer, name) } } // for Name ‘=’ exp ‘,’ exp [‘,’ exp] do block end func _finishForNumStat(lexer *Lexer, lineOfFor int, varName string) *ForNumStat { lexer.NextTokenOfKind(TOKEN_OP_ASSIGN) // for name = initExp := parseExp(lexer) // exp lexer.NextTokenOfKind(TOKEN_SEP_COMMA) // , limitExp := parseExp(lexer) // exp var stepExp Exp if lexer.LookAhead() == TOKEN_SEP_COMMA { lexer.NextToken() // , stepExp = parseExp(lexer) // exp } else { stepExp = &IntegerExp{lexer.Line(), 1} } lineOfDo, _ := lexer.NextTokenOfKind(TOKEN_KW_DO) // do block := parseBlock(lexer) // block lexer.NextTokenOfKind(TOKEN_KW_END) // end return &ForNumStat{lineOfFor, lineOfDo, varName, initExp, limitExp, stepExp, block} } // for namelist in explist do block end // namelist ::= Name {‘,’ Name} // explist ::= exp {‘,’ exp} func _finishForInStat(lexer *Lexer, name0 string) *ForInStat { nameList := _finishNameList(lexer, name0) // for namelist lexer.NextTokenOfKind(TOKEN_KW_IN) // in expList := parseExpList(lexer) // explist lineOfDo, _ := lexer.NextTokenOfKind(TOKEN_KW_DO) // do block := parseBlock(lexer) // block lexer.NextTokenOfKind(TOKEN_KW_END) // end return &ForInStat{lineOfDo, nameList, expList, block} } // namelist ::= Name {‘,’ Name} func _finishNameList(lexer *Lexer, name0 string) []string { names := []string{name0} for lexer.LookAhead() == TOKEN_SEP_COMMA { lexer.NextToken() // , _, name := lexer.NextIdentifier() // Name names = append(names, name) } return names }
即 local 语句.
// local function Name funcbody // local namelist [‘=’ explist] func parseLocalAssignOrFuncDefStat(lexer *Lexer) Stat { lexer.NextTokenOfKind(TOKEN_KW_LOCAL) if lexer.LookAhead() == TOKEN_KW_FUNCTION { return _finishLocalFuncDefStat(lexer) } else { return _finishLocalVarDeclStat(lexer) } }
即跳过 local 查看前面是 function 则是初始化局部函数, 如果是变量则初始化局部变量.
/* http://www.lua.org/manual/5.3/manual.html#3.4.11 function f() end => f = function() end function t.a.b.c.f() end => t.a.b.c.f = function() end function t.a.b.c:f() end => t.a.b.c.f = function(self) end local function f() end => local f; f = function() end The statement `local function f () body end` translates to `local f; f = function () body end` not to `local f = function () body end` (This only makes a difference when the body of the function contains references to f.) */ // local function Name funcbody func _finishLocalFuncDefStat(lexer *Lexer) *LocalFuncDefStat { lexer.NextTokenOfKind(TOKEN_KW_FUNCTION) // local function _, name := lexer.NextIdentifier() // name fdExp := parseFuncDefExp(lexer) // funcbody return &LocalFuncDefStat{name, fdExp} } // local namelist [‘=’ explist] func _finishLocalVarDeclStat(lexer *Lexer) *LocalVarDeclStat { _, name0 := lexer.NextIdentifier() // local Name nameList := _finishNameList(lexer, name0) // { , Name } var expList []Exp = nil if lexer.LookAhead() == TOKEN_OP_ASSIGN { lexer.NextToken() // == expList = parseExpList(lexer) // explist } lastLine := lexer.Line() return &LocalVarDeclStat{lastLine, nameList, expList} }
// varlist ‘=’ explist // functioncall func parseAssignOrFuncCallStat(lexer *Lexer) Stat { prefixExp := parsePrefixExp(lexer) if fc, ok := prefixExp.(*FuncCallExp); ok { return fc } else { return parseAssignStat(lexer, prefixExp) } }
// varlist ‘=’ explist | func parseAssignStat(lexer *Lexer, var0 Exp) *AssignStat { varList := _finishVarList(lexer, var0) // varlist lexer.NextTokenOfKind(TOKEN_OP_ASSIGN) // = expList := parseExpList(lexer) // explist lastLine := lexer.Line() return &AssignStat{lastLine, varList, expList} } // varlist ::= var {‘,’ var} func _finishVarList(lexer *Lexer, var0 Exp) []Exp { vars := []Exp{_checkVar(lexer, var0)} // var for lexer.LookAhead() == TOKEN_SEP_COMMA { // { lexer.NextToken() // , exp := parsePrefixExp(lexer) // var vars = append(vars, _checkVar(lexer, exp)) // } // } return vars } // var ::= Name | prefixexp ‘[’ exp ‘]’ | prefixexp ‘.’ Name func _checkVar(lexer *Lexer, exp Exp) Exp { switch exp.(type) { case *NameExp, *TableAccessExp: return exp } lexer.NextTokenOfKind(-1) // trigger error panic("unreachable!") }
// function funcname funcbody // funcname ::= Name {‘.’ Name} [‘:’ Name] // funcbody ::= ‘(’ [parlist] ‘)’ block end // parlist ::= namelist [‘,’ ‘...’] | ‘...’ // namelist ::= Name {‘,’ Name} func parseFuncDefStat(lexer *Lexer) *AssignStat { lexer.NextTokenOfKind(TOKEN_KW_FUNCTION) // function fnExp, hasColon := _parseFuncName(lexer) // funcname fdExp := parseFuncDefExp(lexer) // funcbody if hasColon { // insert self fdExp.ParList = append(fdExp.ParList, "") copy(fdExp.ParList[1:], fdExp.ParList) fdExp.ParList[0] = "self" } return &AssignStat{ LastLine: fdExp.Line, VarList: []Exp{fnExp}, ExpList: []Exp{fdExp}, } } // funcname ::= Name {‘.’ Name} [‘:’ Name] func _parseFuncName(lexer *Lexer) (exp Exp, hasColon bool) { line, name := lexer.NextIdentifier() exp = &NameExp{line, name} for lexer.LookAhead() == TOKEN_SEP_DOT { lexer.NextToken() line, name := lexer.NextIdentifier() idx := &StringExp{line, name} exp = &TableAccessExp{line, exp, idx} } if lexer.LookAhead() == TOKEN_SEP_COLON { lexer.NextToken() line, name := lexer.NextIdentifier() idx := &StringExp{line, name} exp = &TableAccessExp{line, exp, idx} hasColon = true } return }
将表达式解析分解:
/* exp ::= nil | false | true | Numeral | LiteralString | ‘...’ | functiondef | prefixexp | tableconstructor | exp binop exp | unop exp */ /* exp ::= exp12 exp12 ::= exp11 {or exp11} exp11 ::= exp10 {and exp10} exp10 ::= exp9 {(‘<’ | ‘>’ | ‘<=’ | ‘>=’ | ‘~=’ | ‘==’) exp9} exp9 ::= exp8 {‘|’ exp8} exp8 ::= exp7 {‘~’ exp7} exp7 ::= exp6 {‘&’ exp6} exp6 ::= exp5 {(‘<<’ | ‘>>’) exp5} exp5 ::= exp4 {‘..’ exp4} exp4 ::= exp3 {(‘+’ | ‘-’) exp3} exp3 ::= exp2 {(‘*’ | ‘/’ | ‘//’ | ‘%’) exp2} exp2 ::= {(‘not’ | ‘#’ | ‘-’ | ‘~’)} exp1 exp1 ::= exp0 {‘^’ exp2} exp0 ::= nil | false | true | Numeral | LiteralString | ‘...’ | functiondef | prefixexp | tableconstructor */ func parseExp(lexer *Lexer) Exp { return parseExp12(lexer) }
// x or y func parseExp12(lexer *Lexer) Exp { exp := parseExp11(lexer) for lexer.LookAhead() == TOKEN_OP_OR { line, op, _ := lexer.NextToken() lor := &BinopExp{line, op, exp, parseExp11(lexer)} exp = optimizeLogicalOr(lor) } return exp }
这里要根据运算符的结合性书写具体逻辑. 尤其是 ...
和 ^
具有向右结合性. 下面是 ^
的具体例子:
// x ^ y func parseExp1(lexer *Lexer) Exp { // pow is right associative exp := parseExp0(lexer) if lexer.LookAhead() == TOKEN_OP_POW { line, op, _ := lexer.NextToken() exp = &BinopExp{line, op, exp, parseExp2(lexer)} } return optimizePow(exp) } func optimizePow(exp Exp) Exp { if binop, ok := exp.(*BinopExp); ok { if binop.Op == TOKEN_OP_POW { binop.Exp2 = optimizePow(binop.Exp2) } return optimizeArithBinaryOp(binop) } return exp }
^
是可以连用的, 比如 a^b^c
.
下面是一元运算符的表达式解析:
// exp2 ::= {(‘not’ | ‘#’ | ‘-’ | ‘~’)} exp1 // unary func parseExp2(lexer *Lexer) Exp { switch lexer.LookAhead() { case TOKEN_OP_UNM, TOKEN_OP_BNOT, TOKEN_OP_LEN, TOKEN_OP_NOT: line, op, _ := lexer.NextToken() exp := &UnopExp{line, op, parseExp2(lexer)} return optimizeUnaryOp(exp) } return parseExp1(lexer) }
拼接运算符则是:
// a .. b func parseExp5(lexer *Lexer) Exp { exp := parseExp4(lexer) if lexer.LookAhead() != TOKEN_OP_CONCAT { return exp } line := 0 exps := []Exp{exp} for lexer.LookAhead() == TOKEN_OP_CONCAT { line, _, _ = lexer.NextToken() exps = append(exps, parseExp4(lexer)) } return &ConcatExp{line, exps} }
拼接运算符特殊处理, 采用多叉树方便统一运算。
func parseExp0(lexer *Lexer) Exp { switch lexer.LookAhead() { case TOKEN_VARARG: // ... line, _, _ := lexer.NextToken() return &VarargExp{line} case TOKEN_KW_NIL: // nil line, _, _ := lexer.NextToken() return &NilExp{line} case TOKEN_KW_TRUE: // true line, _, _ := lexer.NextToken() return &TrueExp{line} case TOKEN_KW_FALSE: // false line, _, _ := lexer.NextToken() return &FalseExp{line} case TOKEN_STRING: // LiteralString line, _, token := lexer.NextToken() return &StringExp{line, token} case TOKEN_NUMBER: // Numeral return parseNumberExp(lexer) case TOKEN_SEP_LCURLY: // tableconstructor return parseTableConstructorExp(lexer) case TOKEN_KW_FUNCTION: // functiondef lexer.NextToken() return parseFuncDefExp(lexer) default: // prefixexp return parsePrefixExp(lexer) } }
单个表达式都很简单, 直接判断类型然后返回类型即可. 其中数字字面量会复杂一些.
func parseNumberExp(lexer *Lexer) Exp { line, _, token := lexer.NextToken() if i, ok := number.ParseInteger(token); ok { return &IntegerExp{line, i} } else if f, ok := number.ParseFloat(token); ok { return &FloatExp{line, f} } else { // todo panic("not a number: " + token) } }
// functiondef ::= function funcbody // funcbody ::= ‘(’ [parlist] ‘)’ block end func parseFuncDefExp(lexer *Lexer) *FuncDefExp { line := lexer.Line() // function lexer.NextTokenOfKind(TOKEN_SEP_LPAREN) // ( parList, isVararg := _parseParList(lexer) // [parlist] lexer.NextTokenOfKind(TOKEN_SEP_RPAREN) // ) block := parseBlock(lexer) // block lastLine, _ := lexer.NextTokenOfKind(TOKEN_KW_END) // end return &FuncDefExp{line, lastLine, parList, isVararg, block} }
_parseParList() 用于参数列表解析.
// [parlist] // parlist ::= namelist [‘,’ ‘...’] | ‘...’ func _parseParList(lexer *Lexer) (names []string, isVararg bool) { switch lexer.LookAhead() { case TOKEN_SEP_RPAREN: return nil, false case TOKEN_VARARG: lexer.NextToken() return nil, true } _, name := lexer.NextIdentifier() names = append(names, name) for lexer.LookAhead() == TOKEN_SEP_COMMA { lexer.NextToken() if lexer.LookAhead() == TOKEN_IDENTIFIER { _, name := lexer.NextIdentifier() names = append(names, name) } else { lexer.NextTokenOfKind(TOKEN_VARARG) isVararg = true break } } return }
// tableconstructor ::= ‘{’ [fieldlist] ‘}’ func parseTableConstructorExp(lexer *Lexer) *TableConstructorExp { line := lexer.Line() lexer.NextTokenOfKind(TOKEN_SEP_LCURLY) // { keyExps, valExps := _parseFieldList(lexer) // [fieldlist] lexer.NextTokenOfKind(TOKEN_SEP_RCURLY) // } lastLine := lexer.Line() return &TableConstructorExp{line, lastLine, keyExps, valExps} } // fieldlist ::= field {fieldsep field} [fieldsep] func _parseFieldList(lexer *Lexer) (ks, vs []Exp) { if lexer.LookAhead() != TOKEN_SEP_RCURLY { k, v := _parseField(lexer) ks = append(ks, k) vs = append(vs, v) for _isFieldSep(lexer.LookAhead()) { lexer.NextToken() if lexer.LookAhead() != TOKEN_SEP_RCURLY { k, v := _parseField(lexer) ks = append(ks, k) vs = append(vs, v) } else { break } } } return } // fieldsep ::= ‘,’ | ‘;’ func _isFieldSep(tokenKind int) bool { return tokenKind == TOKEN_SEP_COMMA || tokenKind == TOKEN_SEP_SEMI } // field ::= ‘[’ exp ‘]’ ‘=’ exp | Name ‘=’ exp | exp func _parseField(lexer *Lexer) (k, v Exp) { if lexer.LookAhead() == TOKEN_SEP_LBRACK { lexer.NextToken() // [ k = parseExp(lexer) // exp lexer.NextTokenOfKind(TOKEN_SEP_RBRACK) // ] lexer.NextTokenOfKind(TOKEN_OP_ASSIGN) // = v = parseExp(lexer) // exp return } exp := parseExp(lexer) if nameExp, ok := exp.(*NameExp); ok { if lexer.LookAhead() == TOKEN_OP_ASSIGN { // Name ‘=’ exp => ‘[’ LiteralString ‘]’ = exp lexer.NextToken() k = &StringExp{nameExp.Line, nameExp.Name} v = parseExp(lexer) return } } return nil, exp }
// prefixexp ::= var | functioncall | ‘(’ exp ‘)’ // var ::= Name | prefixexp ‘[’ exp ‘]’ | prefixexp ‘.’ Name // functioncall ::= prefixexp args | prefixexp ‘:’ Name args /* prefixexp ::= Name | ‘(’ exp ‘)’ | prefixexp ‘[’ exp ‘]’ | prefixexp ‘.’ Name | prefixexp [‘:’ Name] args */ func parsePrefixExp(lexer *Lexer) Exp { var exp Exp if lexer.LookAhead() == TOKEN_IDENTIFIER { line, name := lexer.NextIdentifier() // Name exp = &NameExp{line, name} } else { // ‘(’ exp ‘)’ exp = parseParensExp(lexer) } return _finishPrefixExp(lexer, exp) } func _finishPrefixExp(lexer *Lexer, exp Exp) Exp { for { switch lexer.LookAhead() { case TOKEN_SEP_LBRACK: // prefixexp ‘[’ exp ‘]’ lexer.NextToken() // ‘[’ keyExp := parseExp(lexer) // exp lexer.NextTokenOfKind(TOKEN_SEP_RBRACK) // ‘]’ exp = &TableAccessExp{lexer.Line(), exp, keyExp} case TOKEN_SEP_DOT: // prefixexp ‘.’ Name lexer.NextToken() // ‘.’ line, name := lexer.NextIdentifier() // Name keyExp := &StringExp{line, name} exp = &TableAccessExp{line, exp, keyExp} case TOKEN_SEP_COLON, // prefixexp ‘:’ Name args TOKEN_SEP_LPAREN, TOKEN_SEP_LCURLY, TOKEN_STRING: // prefixexp args exp = _finishFuncCallExp(lexer, exp) default: return exp } } return exp }
func parseParensExp(lexer *Lexer) Exp { lexer.NextTokenOfKind(TOKEN_SEP_LPAREN) // ( exp := parseExp(lexer) // exp lexer.NextTokenOfKind(TOKEN_SEP_RPAREN) // ) switch exp.(type) { case *VarargExp, *FuncCallExp, *NameExp, *TableAccessExp: return &ParensExp{exp} } // no need to keep parens return exp }
运算优先级调节应该是放到了代码生成阶段.
// functioncall ::= prefixexp args | prefixexp ‘:’ Name args func _finishFuncCallExp(lexer *Lexer, prefixExp Exp) *FuncCallExp { nameExp := _parseNameExp(lexer) line := lexer.Line() // todo args := _parseArgs(lexer) lastLine := lexer.Line() return &FuncCallExp{line, lastLine, prefixExp, nameExp, args} } func _parseNameExp(lexer *Lexer) *StringExp { if lexer.LookAhead() == TOKEN_SEP_COLON { lexer.NextToken() line, name := lexer.NextIdentifier() return &StringExp{line, name} } return nil } // args ::= ‘(’ [explist] ‘)’ | tableconstructor | LiteralString func _parseArgs(lexer *Lexer) (args []Exp) { switch lexer.LookAhead() { case TOKEN_SEP_LPAREN: // ‘(’ [explist] ‘)’ lexer.NextToken() // TOKEN_SEP_LPAREN if lexer.LookAhead() != TOKEN_SEP_RPAREN { args = parseExpList(lexer) } lexer.NextTokenOfKind(TOKEN_SEP_RPAREN) case TOKEN_SEP_LCURLY: // ‘{’ [fieldlist] ‘}’ args = []Exp{parseTableConstructorExp(lexer)} default: // LiteralString line, str := lexer.NextTokenOfKind(TOKEN_STRING) args = []Exp{&StringExp{line, str}} } return }
具体而言, 优化包括字面量参与的算术, 按位, 逻辑运算符表达式进行优化.
即运算符表达式如果能在编译期进行求值, 会直接进行求值.
一元运算符优化的例子:
// unary func parseExp2(lexer *Lexer) Exp { switch lexer.LookAhead() { case TOKEN_OP_UNM, TOKEN_OP_BNOT, TOKEN_OP_LEN, TOKEN_OP_NOT: line, op, _ := lexer.NextToken() exp := &UnopExp{line, op, parseExp2(lexer)} return optimizeUnaryOp(exp) } return parseExp1(lexer) } func optimizeUnaryOp(exp *UnopExp) Exp { switch exp.Op { case TOKEN_OP_UNM: return optimizeUnm(exp) case TOKEN_OP_NOT: return optimizeNot(exp) case TOKEN_OP_BNOT: return optimizeBnot(exp) default: return exp } }
具体到取负值的优化是:
func optimizeUnm(exp *UnopExp) Exp { switch x := exp.Exp.(type) { // number? case *IntegerExp: x.Val = -x.Val return x case *FloatExp: if x.Val != 0 { x.Val = -x.Val return x } } return exp }
最后是整体的 Parser 入口.
func Parse(chunk, chunkName string) *Block { lexer := NewLexer(chunk, chunkName) block := parseBlock(lexer) lexer.NextTokenOfKind(TOKEN_EOF) return block }
代码生成会分为两个阶段, 第一阶段对AST进行处理, 生成自定义内部结构, 第二个阶段把内部结构转换为函数原型.
type funcInfo struct { codeBuf parent *funcInfo subFuncs []*funcInfo usedRegs int maxRegs int scopeLv int locVars []*locVarInfo locNames map[string]*locVarInfo upvalues map[string]upvalInfo constants map[interface{}]int labels map[string]labelInfo gotos []*gotoInfo breaks [][]int line int lastLine int numParams int isVararg bool }
constants 字段负责存储函数原型的常量表. 接下来是 indexOfConstant().
func (fi *funcInfo) indexOfConstant(k interface{}) int { if idx, found := fi.constants[k]; found { return idx } idx := len(fi.constants) fi.constants[k] = idx return idx }
如果常量不在列表则会塞进去然后返回索引.
type funcInfo struct { codeBuf usedRegs int maxRegs int }
仅仅记录已分配的寄存器数量和最大寄存器数量即可. allocReg() 负责分配并更新最大寄存器数量, 并返回寄存器索引.
func (fi *funcInfo) allocReg() int { fi.usedRegs++ if fi.usedRegs >= 255 { panic("function or expression needs too many registers") } if fi.usedRegs > fi.maxRegs { fi.maxRegs = fi.usedRegs } return fi.usedRegs - 1 }
freeReg() 用于回收.
func (fi *funcInfo) freeReg() { if fi.usedRegs <= 0 { panic("usedRegs <= 0 !") } fi.usedRegs-- }
allocRegs() 用于连续分配 n 个寄存器并返回第一个索引, freeRegs 则负责回收.
func (fi *funcInfo) allocRegs(n int) int { if n <= 0 { panic("n <= 0 !") } for i := 0; i < n; i++ { fi.allocReg() } return fi.usedRegs - n } func (fi *funcInfo) freeRegs(n int) { if n < 0 { panic("n < 0 !") } for i := 0; i < n; i++ { fi.freeReg() } }
这里用单链表来串联同名的局部变量. 即 locVarInfo.
type locVarInfo struct { prev *locVarInfo name string scopeLv int slot int startPC int endPC int captured bool }
name 为局部变量名, scopeLv 记录局部变量所在的作用于层次, slot 字段记录与局部变量名绑定的寄存器索引, captured 表示局部变量是否被闭包捕获.
type funcInfo struct { codeBuf parent *funcInfo subFuncs []*funcInfo usedRegs int maxRegs int scopeLv int locVars []*locVarInfo locNames map[string]*locVarInfo upvalues map[string]upvalInfo constants map[interface{}]int labels map[string]labelInfo gotos []*gotoInfo breaks [][]int line int lastLine int numParams int isVararg bool }
在 funcInfo 中 scopeLv 记录当前作用域层次, locVars 按顺序记录函数内部声明的全部局部变量. locNames 记录当前生效的局部变量.
enterScope() 用于进入新的作用域.
func (fi *funcInfo) enterScope(breakable bool) { fi.scopeLv++ if breakable { fi.breaks = append(fi.breaks, []int{}) } else { fi.breaks = append(fi.breaks, nil) } }
addLocVar() 方法在当前作用域添加一个局部变量, 返回其分配的寄存器索引.
func (fi *funcInfo) addLocVar(name string, startPC int) int { newVar := &locVarInfo{ name: name, prev: fi.locNames[name], scopeLv: fi.scopeLv, slot: fi.allocReg(), // alloc new register startPC: startPC, endPC: 0, } fi.locVars = append(fi.locVars, newVar) fi.locNames[name] = newVar return newVar.slot }
slotOfLocVar() 方法检查局部变量名是否已经和某个寄存器绑定, 如果是则返回寄存器索引, 否则返回 -1.
func (fi *funcInfo) slotOfLocVar(name string) int { if locVar, found := fi.locNames[name]; found { return locVar.slot } return -1 }
exitScope() 用于退出作用于.
func (fi *funcInfo) exitScope(endPC int) { pendingBreakJmps := fi.breaks[len(fi.breaks)-1] fi.breaks = fi.breaks[:len(fi.breaks)-1] a := fi.getJmpArgA() for _, pc := range pendingBreakJmps { sBx := fi.pc() - pc i := (sBx+MAXARG_sBx)<<14 | a<<6 | OP_JMP fi.insts[pc] = uint32(i) } fi.fixGotoJmps() fi.scopeLv-- for _, locVar := range fi.locNames { if locVar.scopeLv > fi.scopeLv { // out of scope locVar.endPC = endPC fi.removeLocVar(locVar) } } }
判断离开了当前作用域后, 会移除 locVar. 使用 removeLocVar()
func (fi *funcInfo) removeLocVar(locVar *locVarInfo) { fi.freeReg() if locVar.prev == nil { delete(fi.locNames, locVar.name) } else if locVar.prev.scopeLv == locVar.scopeLv { fi.removeLocVar(locVar.prev) } else { fi.locNames[locVar.name] = locVar.prev } }
涉及到移除, 同名重新绑定的过程.
Break 可以打断循环, 处理难点有:
所以为了解决, 需要把跳转指令的地址记录在对应的 for, repeat, while 块里, 结束后再修复跳转的目标地址.
type funcInfo struct { breaks [][]int }
这里使用数组来记录循环块内待处理的跳转指令, 数字长度和块的深度对应. 通过判断是否为 nil 就可以判断对应的块是否是循环块. 这里需要修改 enterScope()
func (fi *funcInfo) enterScope(breakable bool) { fi.scopeLv++ if breakable { fi.breaks = append(fi.breaks, []int{}) } else { fi.breaks = append(fi.breaks, nil) } }
addBreakJmp() 方法把 break 语句对应的跳转指令添加到最近的循环块里, 找不到则 painc().
func (fi *funcInfo) addBreakJmp(pc int) { for i := fi.scopeLv; i >= 0; i-- { if fi.breaks[i] != nil { // breakable fi.breaks[i] = append(fi.breaks[i], pc) return } } panic("<break> at line ? not inside a loop!") }
exitScope() 也要增加相关代码:
func (fi *funcInfo) exitScope(endPC int) { pendingBreakJmps := fi.breaks[len(fi.breaks)-1] fi.breaks = fi.breaks[:len(fi.breaks)-1] a := fi.getJmpArgA() for _, pc := range pendingBreakJmps { sBx := fi.pc() - pc i := (sBx+MAXARG_sBx)<<14 | a<<6 | OP_JMP fi.insts[pc] = uint32(i) } fi.fixGotoJmps() fi.scopeLv-- for _, locVar := range fi.locNames { if locVar.scopeLv > fi.scopeLv { // out of scope locVar.endPC = endPC fi.removeLocVar(locVar) } } }
简单来讲, 在 enterScope 过程将 fi.breaks 增加, 然后 addBreakJmp 负责将跳转地址 (pc) 怼到当前 fi.breaks 的位置里面. 当 exitScope 时, 则进行递减流程.
即在当前作用域捕获外围函数中的局部变量. Upvalue 名只能绑定唯一的 Upvalue. 所以结构相对简单.
type upvalInfo struct { locVarSlot int upvalIndex int index int }
locVarSlot 记录该局部变量占用的寄存器索引, 如果为空则已经被外围函数捕获. upvalIndex 记录该 Upvalue 在直接外围函数 Upvalue 表中的索引. index 记录 Upvalue 在函数中出现的顺序.
type funcInfo struct { codeBuf parent *funcInfo upvalues map[string]upvalInfo }
upvalues 即 Upvalue 表. parent 用于定位外围函数的局部变量表和 Upvalue 表.
func (fi *funcInfo) indexOfUpval(name string) int { if upval, ok := fi.upvalues[name]; ok { return upval.index } if fi.parent != nil { if locVar, found := fi.parent.locNames[name]; found { idx := len(fi.upvalues) fi.upvalues[name] = upvalInfo{locVar.slot, -1, idx} locVar.captured = true return idx } if uvIdx := fi.parent.indexOfUpval(name); uvIdx >= 0 { idx := len(fi.upvalues) fi.upvalues[name] = upvalInfo{-1, uvIdx, idx} return idx } } return -1 }
indexOfUpval 用于实施绑定并返回索引.
type codeBuf struct { insts []uint32 lineNums []uint32 }
insts 存储字节码.
func (cb *codeBuf) emitABC(line, opcode, a, b, c int) { i := b<<23 | c<<14 | a<<6 | opcode cb.insts = append(cb.insts, uint32(i)) cb.lineNums = append(cb.lineNums, uint32(line)) } func (cb *codeBuf) emitABx(line, opcode, a, bx int) { i := bx<<14 | a<<6 | opcode cb.insts = append(cb.insts, uint32(i)) cb.lineNums = append(cb.lineNums, uint32(line)) } func (cb *codeBuf) emitAsBx(line, opcode, a, sBx int) { i := (sBx+MAXARG_sBx)<<14 | a<<6 | opcode cb.insts = append(cb.insts, uint32(i)) cb.lineNums = append(cb.lineNums, uint32(line)) } func (cb *codeBuf) emitAx(line, opcode, ax int) { i := ax<<6 | opcode cb.insts = append(cb.insts, uint32(i)) cb.lineNums = append(cb.lineNums, uint32(line)) }
这段代码用于将 line, opcode, abc 寄存器拼接成指令并放入 insts.
func (cb *codeBuf) fixSbx(pc, sBx int) { if sBx > 0 && sBx > MAXARG_sBx || sBx < 0 && -sBx > MAXARG_sBx { panic("control structure too long") } i := cb.insts[pc] i = i << 18 >> 18 // clear sBx i = i | uint32(sBx+MAXARG_sBx)<<14 // reset sBx cb.insts[pc] = i }
定义 funcInfo 的生成函数.
func newFuncInfo(parent *funcInfo, fd *FuncDefExp) *funcInfo { return &funcInfo{ codeBuf: codeBuf{ insts: make([]uint32, 0, 8), lineNums: make([]uint32, 0, 8), }, parent: parent, subFuncs: []*funcInfo{}, locVars: make([]*locVarInfo, 0, 8), locNames: map[string]*locVarInfo{}, upvalues: map[string]upvalInfo{}, constants: map[interface{}]int{}, labels: map[string]labelInfo{}, gotos: nil, breaks: make([][]int, 1), line: fd.Line, lastLine: fd.LastLine, numParams: len(fd.ParList), isVararg: fd.IsVararg, } }
func cgBlock(fi *funcInfo, node *Block) { for _, stat := range node.Stats { cgStat(fi, stat) } if node.RetExps != nil { cgRetStat(fi, node.RetExps, node.LastLine) } } func cgRetStat(fi *funcInfo, exps []Exp, lastLine int) { nExps := len(exps) if nExps == 0 { fi.emitReturn(lastLine, 0, 0) return } if nExps == 1 { if nameExp, ok := exps[0].(*NameExp); ok { if r := fi.slotOfLocVar(nameExp.Name); r >= 0 { fi.emitReturn(lastLine, r, 1) return } } if fcExp, ok := exps[0].(*FuncCallExp); ok { r := fi.allocReg() cgTailCallExp(fi, fcExp, r) fi.freeReg() fi.emitReturn(lastLine, r, -1) return } } multRet := isVarargOrFuncCall(exps[nExps-1]) for i, exp := range exps { r := fi.allocReg() if i == nExps-1 && multRet { cgExp(fi, exp, r, -1) } else { cgExp(fi, exp, r, 1) } } fi.freeRegs(nExps) a := fi.usedRegs // correct? if multRet { fi.emitReturn(lastLine, a, -1) } else { fi.emitReturn(lastLine, a, nExps) } }
接下来时语句的编译过程.
func cgStat(fi *funcInfo, node Stat) { switch stat := node.(type) { case *FuncCallStat: cgFuncCallStat(fi, stat) case *BreakStat: cgBreakStat(fi, stat) case *DoStat: cgDoStat(fi, stat) case *WhileStat: cgWhileStat(fi, stat) case *RepeatStat: cgRepeatStat(fi, stat) case *IfStat: cgIfStat(fi, stat) case *ForNumStat: cgForNumStat(fi, stat) case *ForInStat: cgForInStat(fi, stat) case *AssignStat: cgAssignStat(fi, stat) case *LocalVarDeclStat: cgLocalVarDeclStat(fi, stat) case *LocalFuncDefStat: cgLocalFuncDefStat(fi, stat) case *LabelStat: cgLabelStat(fi, stat) case *GotoStat: cgGotoStat(fi, stat) } }
local function f() end
等价于 local f; f=function() end
func cgLocalFuncDefStat(fi *funcInfo, node *LocalFuncDefStat) { r := fi.addLocVar(node.Name, fi.pc()+2) cgFuncDefExp(fi, node.Exp, r) } func cgFuncCallStat(fi *funcInfo, node *FuncCallStat) { r := fi.allocReg() cgFuncCallExp(fi, node, r, 0) fi.freeReg() } func cgBreakStat(fi *funcInfo, node *BreakStat) { pc := fi.emitJmp(node.Line, 0, 0) fi.addBreakJmp(pc) } func cgDoStat(fi *funcInfo, node *DoStat) { fi.enterScope(false) cgBlock(fi, node.Block) fi.closeOpenUpvals(node.Block.LastLine) fi.exitScope(fi.pc() + 1) }
生成过程大部分都是 addLocVar 声明局部变量, allocReg 申请寄存器, enterScope, 进入新的作用域等.
closeOpenUpvals() 负责 Upvalue 闭合, getJmpArgA 负责获取跳转地址 A.
func (fi *funcInfo) closeOpenUpvals(line int) { a := fi.getJmpArgA() if a > 0 { fi.emitJmp(line, a, 0) } } func (fi *funcInfo) getJmpArgA() int { hasCapturedLocVars := false minSlotOfLocVars := fi.maxRegs for _, locVar := range fi.locNames { if locVar.scopeLv == fi.scopeLv { for v := locVar; v != nil && v.scopeLv == fi.scopeLv; v = v.prev { if v.captured { hasCapturedLocVars = true } if v.slot < minSlotOfLocVars && v.name[0] != '(' { minSlotOfLocVars = v.slot } } } } if hasCapturedLocVars { return minSlotOfLocVars + 1 } else { return 0 } }
/* ______________ / false? jmp | / | while exp do block end <-' ^ \ |___________/ jmp */ func cgWhileStat(fi *funcInfo, node *WhileStat) { pcBeforeExp := fi.pc() oldRegs := fi.usedRegs a, _ := expToOpArg(fi, node.Exp, ArgReg) fi.usedRegs = oldRegs line := lastLineOf(node.Exp) fi.emitTest(line, a, 0) pcJmpToEnd := fi.emitJmp(line, 0, 0) fi.enterScope(true) cgBlock(fi, node.Block) fi.closeOpenUpvals(node.Block.LastLine) fi.emitJmp(node.Block.LastLine, 0, pcBeforeExp-fi.pc()-1) fi.exitScope(fi.pc()) fi.fixSbx(pcJmpToEnd, fi.pc()-pcJmpToEnd) } /* ______________ | false? jmp | V / repeat block until exp */ func cgRepeatStat(fi *funcInfo, node *RepeatStat) { fi.enterScope(true) pcBeforeBlock := fi.pc() cgBlock(fi, node.Block) oldRegs := fi.usedRegs a, _ := expToOpArg(fi, node.Exp, ArgReg) fi.usedRegs = oldRegs line := lastLineOf(node.Exp) fi.emitTest(line, a, 0) fi.emitJmp(line, fi.getJmpArgA(), pcBeforeBlock-fi.pc()-1) fi.closeOpenUpvals(line) fi.exitScope(fi.pc() + 1) }
这两个语句的核心即获取当前地址, 然后生成跳转地址实现循环. 然后对条件表达式进行求值, 决定是否跳出.
/* _________________ _________________ _____________ / false? jmp | / false? jmp | / false? jmp | / V / V / V if exp1 then block1 elseif exp2 then block2 elseif true then block3 end <-. \ \ \ | \_______________________\_______________________\_____| jmp jmp jmp */ func cgIfStat(fi *funcInfo, node *IfStat) { pcJmpToEnds := make([]int, len(node.Exps)) pcJmpToNextExp := -1 for i, exp := range node.Exps { if pcJmpToNextExp >= 0 { fi.fixSbx(pcJmpToNextExp, fi.pc()-pcJmpToNextExp) } oldRegs := fi.usedRegs a, _ := expToOpArg(fi, exp, ArgReg) fi.usedRegs = oldRegs line := lastLineOf(exp) fi.emitTest(line, a, 0) pcJmpToNextExp = fi.emitJmp(line, 0, 0) block := node.Blocks[i] fi.enterScope(false) cgBlock(fi, block) fi.closeOpenUpvals(block.LastLine) fi.exitScope(fi.pc() + 1) if i < len(node.Exps)-1 { pcJmpToEnds[i] = fi.emitJmp(block.LastLine, 0, 0) } else { pcJmpToEnds[i] = pcJmpToNextExp } } for _, pc := range pcJmpToEnds { fi.fixSbx(pc, fi.pc()-pc) } }
求值, 执行代码块, 或求值, 跳转的过程.
func cgForNumStat(fi *funcInfo, node *ForNumStat) { forIndexVar := "(for index)" forLimitVar := "(for limit)" forStepVar := "(for step)" fi.enterScope(true) cgLocalVarDeclStat(fi, &LocalVarDeclStat{ NameList: []string{forIndexVar, forLimitVar, forStepVar}, ExpList: []Exp{node.InitExp, node.LimitExp, node.StepExp}, }) fi.addLocVar(node.VarName, fi.pc()+2) a := fi.usedRegs - 4 pcForPrep := fi.emitForPrep(node.LineOfDo, a, 0) cgBlock(fi, node.Block) fi.closeOpenUpvals(node.Block.LastLine) pcForLoop := fi.emitForLoop(node.LineOfFor, a, 0) fi.fixSbx(pcForPrep, pcForLoop-pcForPrep-1) fi.fixSbx(pcForLoop, pcForPrep-pcForLoop) fi.exitScope(fi.pc()) fi.fixEndPC(forIndexVar, 1) fi.fixEndPC(forLimitVar, 1) fi.fixEndPC(forStepVar, 1) } func cgForInStat(fi *funcInfo, node *ForInStat) { forGeneratorVar := "(for generator)" forStateVar := "(for state)" forControlVar := "(for control)" fi.enterScope(true) cgLocalVarDeclStat(fi, &LocalVarDeclStat{ //LastLine: 0, NameList: []string{forGeneratorVar, forStateVar, forControlVar}, ExpList: node.ExpList, }) for _, name := range node.NameList { fi.addLocVar(name, fi.pc()+2) } pcJmpToTFC := fi.emitJmp(node.LineOfDo, 0, 0) cgBlock(fi, node.Block) fi.closeOpenUpvals(node.Block.LastLine) fi.fixSbx(pcJmpToTFC, fi.pc()-pcJmpToTFC) line := lineOf(node.ExpList[0]) rGenerator := fi.slotOfLocVar(forGeneratorVar) fi.emitTForCall(line, rGenerator, len(node.NameList)) fi.emitTForLoop(line, rGenerator+2, pcJmpToTFC-fi.pc()-1) fi.exitScope(fi.pc() - 1) fi.fixEndPC(forGeneratorVar, 2) fi.fixEndPC(forStateVar, 2) fi.fixEndPC(forControlVar, 2) }
for 循环需要三个特殊局部变量, 用于存储索引, 限制, 步长. 然后执行 block, 最后修复偏移量.
func cgLocalVarDeclStat(fi *funcInfo, node *LocalVarDeclStat) { exps := removeTailNils(node.ExpList) nExps := len(exps) nNames := len(node.NameList) oldRegs := fi.usedRegs if nExps == nNames { for _, exp := range exps { a := fi.allocReg() cgExp(fi, exp, a, 1) } } else if nExps > nNames { for i, exp := range exps { a := fi.allocReg() if i == nExps-1 && isVarargOrFuncCall(exp) { cgExp(fi, exp, a, 0) } else { cgExp(fi, exp, a, 1) } } } else { // nNames > nExps multRet := false for i, exp := range exps { a := fi.allocReg() if i == nExps-1 && isVarargOrFuncCall(exp) { multRet = true n := nNames - nExps + 1 cgExp(fi, exp, a, n) fi.allocRegs(n - 1) } else { cgExp(fi, exp, a, 1) } } if !multRet { n := nNames - nExps a := fi.allocRegs(n) fi.emitLoadNil(node.LastLine, a, n) } } fi.usedRegs = oldRegs startPC := fi.pc() + 1 for _, name := range node.NameList { fi.addLocVar(name, startPC) } }
第一部分是等号左右相等, 则分配临时变量, 进行表达式求值, 然后局部变量与寄存器绑定.
第二部分是表达式比名称多, 多余的也进行求值即可.
第三部分是表达式比变量名称少, 则要么多次赋值, 要么装入 nil.
func cgAssignStat(fi *funcInfo, node *AssignStat) { exps := removeTailNils(node.ExpList) nExps := len(exps) nVars := len(node.VarList) tRegs := make([]int, nVars) kRegs := make([]int, nVars) vRegs := make([]int, nVars) oldRegs := fi.usedRegs for i, exp := range node.VarList { if taExp, ok := exp.(*TableAccessExp); ok { tRegs[i] = fi.allocReg() cgExp(fi, taExp.PrefixExp, tRegs[i], 1) kRegs[i] = fi.allocReg() cgExp(fi, taExp.KeyExp, kRegs[i], 1) } else { name := exp.(*NameExp).Name if fi.slotOfLocVar(name) < 0 && fi.indexOfUpval(name) < 0 { // global var kRegs[i] = -1 if fi.indexOfConstant(name) > 0xFF { kRegs[i] = fi.allocReg() } } } } for i := 0; i < nVars; i++ { vRegs[i] = fi.usedRegs + i } if nExps >= nVars { for i, exp := range exps { a := fi.allocReg() if i >= nVars && i == nExps-1 && isVarargOrFuncCall(exp) { cgExp(fi, exp, a, 0) } else { cgExp(fi, exp, a, 1) } } } else { // nVars > nExps multRet := false for i, exp := range exps { a := fi.allocReg() if i == nExps-1 && isVarargOrFuncCall(exp) { multRet = true n := nVars - nExps + 1 cgExp(fi, exp, a, n) fi.allocRegs(n - 1) } else { cgExp(fi, exp, a, 1) } } if !multRet { n := nVars - nExps a := fi.allocRegs(n) fi.emitLoadNil(node.LastLine, a, n) } } lastLine := node.LastLine for i, exp := range node.VarList { if nameExp, ok := exp.(*NameExp); ok { varName := nameExp.Name if a := fi.slotOfLocVar(varName); a >= 0 { fi.emitMove(lastLine, a, vRegs[i]) } else if b := fi.indexOfUpval(varName); b >= 0 { fi.emitSetUpval(lastLine, vRegs[i], b) } else if a := fi.slotOfLocVar("_ENV"); a >= 0 { if kRegs[i] < 0 { b := 0x100 + fi.indexOfConstant(varName) fi.emitSetTable(lastLine, a, b, vRegs[i]) } else { fi.emitSetTable(lastLine, a, kRegs[i], vRegs[i]) } } else { // global var a := fi.indexOfUpval("_ENV") if kRegs[i] < 0 { b := 0x100 + fi.indexOfConstant(varName) fi.emitSetTabUp(lastLine, a, b, vRegs[i]) } else { fi.emitSetTabUp(lastLine, a, kRegs[i], vRegs[i]) } } } else { fi.emitSetTable(lastLine, tRegs[i], kRegs[i], vRegs[i]) } } // todo fi.usedRegs = oldRegs }
赋值语句复杂在可以多重赋值, 给局部变量, Upvalue, 全局变量赋值, 还可以根据索引修改表.
第一部分首先分配临时变量, 对表达式进行求值. tRegs, kRegs, vRegs 分别表示表, 键, 值分配的临时变量.
for i, exp := range node.VarList
这里开始处理索引赋值.
if nExps >= nVars
这里开始是多重赋值.
for i, exp := range node.VarList
这里做具体的赋值处理, 局部变量赋值使用 MOVE, Upvalue 赋值使用 SETUPVALUE, 全局变量生成 SETTABUP. 按索引则生成 SETTABLE, 最后还要释放临时变量.
Lua 表达式可分为字面量表达式, 构造器表达式, 运算符表达式, 前缀表达式, vararg 表达式.
cgExp 负责表达式编译.
// todo: rename to evalExp()? func cgExp(fi *funcInfo, node Exp, a, n int) { switch exp := node.(type) { case *NilExp: fi.emitLoadNil(exp.Line, a, n) case *FalseExp: fi.emitLoadBool(exp.Line, a, 0, 0) case *TrueExp: fi.emitLoadBool(exp.Line, a, 1, 0) case *IntegerExp: fi.emitLoadK(exp.Line, a, fi.indexOfConstant(exp.Val)) case *FloatExp: fi.emitLoadK(exp.Line, a, fi.indexOfConstant(exp.Val)) case *StringExp: fi.emitLoadK(exp.Line, a, fi.indexOfConstant(exp.Str)) case *ParensExp: cgExp(fi, exp.Exp, a, 1) case *VarargExp: cgVarargExp(fi, exp, a, n) case *FuncDefExp: cgFuncDefExp(fi, exp, a) case *TableConstructorExp: cgTableConstructorExp(fi, exp, a) case *UnopExp: cgUnopExp(fi, exp, a) case *BinopExp: cgBinopExp(fi, exp, a) case *ConcatExp: cgConcatExp(fi, exp, a) case *NameExp: cgNameExp(fi, exp, a) case *TableAccessExp: cgTableAccessExp(fi, exp, a) case *FuncCallExp: cgFuncCallExp(fi, exp, a, n) } }
字面量全部在函数中处理了, 执行相关的 load 指令即可.
func cgVarargExp(fi *funcInfo, node *VarargExp, a, n int) { if !fi.isVararg { panic("cannot use '...' outside a vararg function") } fi.emitVararg(node.Line, a, n) }
// f[a] := function(args) body end func cgFuncDefExp(fi *funcInfo, node *FuncDefExp, a int) { subFI := newFuncInfo(fi, node) fi.subFuncs = append(fi.subFuncs, subFI) for _, param := range node.ParList { subFI.addLocVar(param, 0) } cgBlock(subFI, node.Block) subFI.exitScope(subFI.pc() + 2) subFI.emitReturn(node.LastLine, 0, 0) bx := len(fi.subFuncs) - 1 fi.emitClosure(node.LastLine, a, bx) }
即, 生成新的 funcInfo, 然后跟上层 funcInfo 绑定, 然后将参数转化为局部变量, 然后进行 return, 最后添加 closure.
func cgTableConstructorExp(fi *funcInfo, node *TableConstructorExp, a int) { nArr := 0 for _, keyExp := range node.KeyExps { if keyExp == nil { nArr++ } } nExps := len(node.KeyExps) multRet := nExps > 0 && isVarargOrFuncCall(node.ValExps[nExps-1]) fi.emitNewTable(node.Line, a, nArr, nExps-nArr) arrIdx := 0 for i, keyExp := range node.KeyExps { valExp := node.ValExps[i] if keyExp == nil { arrIdx++ tmp := fi.allocReg() if i == nExps-1 && multRet { cgExp(fi, valExp, tmp, -1) } else { cgExp(fi, valExp, tmp, 1) } if arrIdx%50 == 0 || arrIdx == nArr { // LFIELDS_PER_FLUSH n := arrIdx % 50 if n == 0 { n = 50 } fi.freeRegs(n) line := lastLineOf(valExp) c := (arrIdx-1)/50 + 1 // todo: c > 0xFF if i == nExps-1 && multRet { fi.emitSetList(line, a, 0, c) } else { fi.emitSetList(line, a, n, c) } } continue } b := fi.allocReg() cgExp(fi, keyExp, b, 1) c := fi.allocReg() cgExp(fi, valExp, c, 1) fi.freeRegs(2) line := lastLineOf(valExp) fi.emitSetTable(line, a, b, c) } }
先初始化, 然后 emitNewTable(). 接下来开始构造表. 最后做关联处理并 emitSetTable().
// r[a] := op exp func cgUnopExp(fi *funcInfo, node *UnopExp, a int) { oldRegs := fi.usedRegs b, _ := expToOpArg(fi, node.Exp, ArgReg) fi.emitUnaryOp(node.Line, node.Op, a, b) fi.usedRegs = oldRegs } // r[a] = op r[b] func (cb *codeBuf) emitUnaryOp(line, op, a, b int) { switch op { case TOKEN_OP_NOT: cb.emitABC(line, OP_NOT, a, b, 0) case TOKEN_OP_BNOT: cb.emitABC(line, OP_BNOT, a, b, 0) case TOKEN_OP_LEN: cb.emitABC(line, OP_LEN, a, b, 0) case TOKEN_OP_UNM: cb.emitABC(line, OP_UNM, a, b, 0) } }
运算符表达式同样先申请临时变量, 对表达式求值, 最后释放临时变量并生成相应的一元运算符即可.
// r[a] := exp1 .. exp2 func cgConcatExp(fi *funcInfo, node *ConcatExp, a int) { for _, subExp := range node.Exps { a := fi.allocReg() cgExp(fi, subExp, a, 1) } c := fi.usedRegs - 1 b := c - len(node.Exps) + 1 fi.freeRegs(c - b + 1) fi.emitABC(node.Line, OP_CONCAT, a, b, c) }
由于拼接表达式被我们处理成了树状结构, 因此每次循环先进行求值, 然后执行 emitABC.
// r[a] := exp1 op exp2 func cgBinopExp(fi *funcInfo, node *BinopExp, a int) { switch node.Op { case TOKEN_OP_AND, TOKEN_OP_OR: oldRegs := fi.usedRegs b, _ := expToOpArg(fi, node.Exp1, ArgReg) fi.usedRegs = oldRegs if node.Op == TOKEN_OP_AND { fi.emitTestSet(node.Line, a, b, 0) } else { fi.emitTestSet(node.Line, a, b, 1) } pcOfJmp := fi.emitJmp(node.Line, 0, 0) b, _ = expToOpArg(fi, node.Exp2, ArgReg) fi.usedRegs = oldRegs fi.emitMove(node.Line, a, b) fi.fixSbx(pcOfJmp, fi.pc()-pcOfJmp) default: oldRegs := fi.usedRegs b, _ := expToOpArg(fi, node.Exp1, ArgRK) c, _ := expToOpArg(fi, node.Exp2, ArgRK) fi.emitBinaryOp(node.Line, node.Op, a, b, c) fi.usedRegs = oldRegs } }
// r[a] := name func cgNameExp(fi *funcInfo, node *NameExp, a int) { if r := fi.slotOfLocVar(node.Name); r >= 0 { fi.emitMove(node.Line, a, r) } else if idx := fi.indexOfUpval(node.Name); idx >= 0 { fi.emitGetUpval(node.Line, a, idx) } else { // x => _ENV['x'] taExp := &TableAccessExp{ LastLine: node.Line, PrefixExp: &NameExp{node.Line, "_ENV"}, KeyExp: &StringExp{node.Line, node.Name}, } cgTableAccessExp(fi, taExp, a) } }
同样区分局部, Upvalue, 全局变量处理即可.
// r[a] := f(args) func cgFuncCallExp(fi *funcInfo, node *FuncCallExp, a, n int) { nArgs := prepFuncCall(fi, node, a) fi.emitCall(node.Line, a, nArgs, n) } // return f(args) func cgTailCallExp(fi *funcInfo, node *FuncCallExp, a int) { nArgs := prepFuncCall(fi, node, a) fi.emitTailCall(node.Line, a, nArgs) } func prepFuncCall(fi *funcInfo, node *FuncCallExp, a int) int { nArgs := len(node.Args) lastArgIsVarargOrFuncCall := false cgExp(fi, node.PrefixExp, a, 1) if node.NameExp != nil { fi.allocReg() c, k := expToOpArg(fi, node.NameExp, ArgRK) fi.emitSelf(node.Line, a, a, c) if k == ArgReg { fi.freeRegs(1) } } for i, arg := range node.Args { tmp := fi.allocReg() if i == nArgs-1 && isVarargOrFuncCall(arg) { lastArgIsVarargOrFuncCall = true cgExp(fi, arg, tmp, -1) } else { cgExp(fi, arg, tmp, 1) } } fi.freeRegs(nArgs) if node.NameExp != nil { fi.freeReg() nArgs++ } if lastArgIsVarargOrFuncCall { nArgs = -1 } return nArgs }
func toProto(fi *funcInfo) *Prototype { proto := &Prototype{ LineDefined: uint32(fi.line), LastLineDefined: uint32(fi.lastLine), NumParams: byte(fi.numParams), MaxStackSize: byte(fi.maxRegs), Code: fi.insts, Constants: getConstants(fi), Upvalues: getUpvalues(fi), Protos: toProtos(fi.subFuncs), LineInfo: fi.lineNums, LocVars: getLocVars(fi), UpvalueNames: getUpvalueNames(fi), } if fi.line == 0 { proto.LastLineDefined = 0 } if proto.MaxStackSize < 2 { proto.MaxStackSize = 2 // todo } if fi.isVararg { proto.IsVararg = 1 // todo } return proto } func toProtos(fis []*funcInfo) []*Prototype { protos := make([]*Prototype, len(fis)) for i, fi := range fis { protos[i] = toProto(fi) } return protos } func getConstants(fi *funcInfo) []interface{} { consts := make([]interface{}, len(fi.constants)) for k, idx := range fi.constants { consts[idx] = k } return consts } func getLocVars(fi *funcInfo) []LocVar { locVars := make([]LocVar, len(fi.locVars)) for i, locVar := range fi.locVars { locVars[i] = LocVar{ VarName: locVar.name, StartPC: uint32(locVar.startPC), EndPC: uint32(locVar.endPC), } } return locVars } func getUpvalues(fi *funcInfo) []Upvalue { upvals := make([]Upvalue, len(fi.upvalues)) for _, uv := range fi.upvalues { if uv.locVarSlot >= 0 { // instack upvals[uv.index] = Upvalue{1, byte(uv.locVarSlot)} } else { upvals[uv.index] = Upvalue{0, byte(uv.upvalIndex)} } } return upvals } func getUpvalueNames(fi *funcInfo) []string { names := make([]string, len(fi.upvalues)) for name, uv := range fi.upvalues { names[uv.index] = name } return names }
func Compile(chunk, chunkName string) *binchunk.Prototype { ast := parser.Parse(chunk, chunkName) proto := codegen.GenProto(ast) setSource(proto, chunkName) return proto } func setSource(proto *binchunk.Prototype, chunkName string) { proto.Source = chunkName for _, f := range proto.Protos { setSource(f, chunkName) } }